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Abstract 

Let P be a nontrivial fc-ary predicate over a finite alphabet. Consider a random CSP(P) 
instance X over n variables with m constraints, each being P applied to k random literals. When 
m ^ n the instance I will be unsatisfiable with high probability, and the natural associated 
algorithmic task is to find a refutation of I — i.e., a certificate of unsatisfiability. When P 
is the 3-ary Boolean OR predicate, this is the well studied problem of refuting random 3-SAT 
formulas; in this case, an efficient algorithm is known only when m ^ Understanding 

the density required for average-case refutation of other predicates is of importance for various 
areas of complexity, including cryptography, proof complexity, and learning theory. The main 
previously-known result is that for a general Boolean k-aiy predicate P, having m ^ 
random constraints suffices for efficient refutation. 

In this work we give a general criterion for arbitrary fc-ary predicates, one that often yields 
efficient refutation algorithms at much lower densities. Specifically, if P fails to support a t-wise 
independent (uniform) probability distribution (2 < t < fc), then there is an efficient algorithm 
that refutes random CSP(P) instances I with high probability, provided m :$> Indeed, 

our algorithm will “somewhat strongly” refute I, certifying Opt (I) <1 — Ofe(l);ift = fc then 
we furthermore get the strongest possible refutation, certifying Opt(P) < E[P] -|-o(l). This last 
result is new even in the context of random fc-SAT. 

Regarding the optimality of our m ^ density requirement, prior work on SDP hierar¬ 
chies has given some evidence that efficient refutation of random CSP(P) may be impossible 
when m <C . Thus there is an indication our algorithm’s dependence on m is optimal for ev¬ 
ery P, at least in the context of SDP hierarchies. Along these lines, we show that our refutation 
algorithm can be carried out by the 0(l)-round SOS SDP hierarchy. 

Finally, as an application of our result, we falsify the “SRCSP assumptions” used to show 
various hardness-of-learning results in the recent (STOC 2014) work of Daniely, Linial, and 
Shalev-Shwartz. 
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1 On refutation of random CSPs 


Constraint satisfaction problems (CSPs) play a major role in compnter science. There is a vast 
theory [BJK05] of how algebraic properties of a CSP predicate affect its worst-case satisfiability 
complexity; there is a similarly vast theory [Rag09] of worst-case approximability of CSPs. Finally, 
there is a rich range of research — from the fields of computer science, mathematics, and physics 
— on the average-case complexity of random CSPs; see [Ach09] for a survey just of random /c-SAT. 
This paper is concerned with random CSPs, and in particular the problem of efficiently refuting 
satisfiability for random instances. This is a well-studied algorithmic task with connections to, e.g., 
proof complexity [BB02], inapproximability [Fei02], SAT-solvers [SAT], cryptography [ABWIO], 
learning theory [DLSS14], statistical physics [CLP02], and complexity theory [BKS13]. 

Historically, random CSPs are probably best studied in the case of /c-SAT, k > 3. The model 
here involves choosing a CNF formula I over n variables by drawing m clauses (ORs of k literals) 
independently and uniformly at random. (The precise details of the random model are inessential; 
see Section 3.1 for more information.) This is one of the best known efficient ways of generating 
hard-seeming instances of NP-complete and coNP-complete problems. The computational hardness 
depends crucially on the density, a = m/n. For each k there is (conjecturally) a constant critical 
density ak such that I is satisfiable with high probability when a < ak, and I is unsatisfiable with 
high probability when a > ak- (Here and throughout, “with high probability (whp)” means with 
probability 1 — o(l) as n ^ oo.) This phenomenon occurs for all nontrivial random CSPs; in the 
case of /c-SAT it’s been rigorously proven [DSS15] for sufficiently large k. 

There is a natural algorithmic task associated with each of the two regimes. When a < ak one 
wants to find a satisfying assignment for I. When a > ak one wants to refute I; i.e., find a cer¬ 
tificate of unsatisfiability. Most heuristic SAT-solvers use DPLL-based algorithms; on unsatisfiable 
instances, they produce certificates that can be viewed as refutations within the Resolution proof 
system. More generally, a refutation algorithm for density a is any algorithm that: a) outputs 
“unsatisfiable” or “fail”; b) never incorrectly outputs “unsatisfiable”; c) outputs “fail” with low 
probability (i.e., probability o(l)).^ Empirical work suggests that as a increases towards ak, finding 
satisfying assignments becomes more difficult; and conversely, as a increases beyond ak, finding 
certificates of unsatisfiability gradually becomes easier. 

A seminal paper of Chvatal and Szemeredi [CS88] showed that for any sufficiently large integer c 
(depending on k), a random /c-SAT instance with m = cn requires Resolution refutations of size 

20(n) 

(whp). On the other hand, Fu [Fu96] showed that polynomial-size Resolution refutations exist 
(whp) once m > 0(n^“^); Beame et al. [BKPS99] subsequently showed that such proofs could be 
found efficiently.^ A breakthrough came in 2001, when Goerdt and Krivelevich [GKOl] abandoned 
combinatorial refutations for spectral ones, showing that random /c-SAT instances can be efficiently 
refuted when m > Soon thereafter, Friedman and Goerdt [FGOl] (see also [FGK05]) 

showed that for 3-SAT, efficient spectral refutations exist once m > (for any e > 0). These 

densities for /c-SAT — around for 3-SAT and in general — have not been fundamentally 

improved upon in the last 14 years. ^ (See Table 1 for a more detailed history of results in this 

^We caution the reader that in this paper we do not consider the related, but distinct, scenario of distinguishing 
planted random instances from truly random ones. 

^In this paper we use the following not-fully-standard terminology: A statement of the form “If f{n) > 0{g{n)) 
then X” means that there exists a certain function h{n), with h{n) being 0{g{n)), such that the statement “If 
/(n) > h{n) then is true. We also use 0{f{n)) to denote 0{f{n) ■ polylog(/(n)), and Ok{f{n)) to denote that 
the hidden constant has a dependence on k (most often of the form 2'^*-*^). 

^Actually, it is claimed in [GJ02] that one can obtain ^ “along the lines of [FGOl]”. On one hand, 

this is true, as we’ll see in this paper. On the other hand, no proof was provided in [GJ02], and we have not found 
the claim repeated in any paper subsequent to 2003. 
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area.) Improving the bound for 3-SAT is widely regarded as a major open problem [ABWIO], 
with conjectures regarding its possibility going both ways [BB02, DLSS13]. See also the intriguing 
work of Feige, Kim, and Ofek [FKO06] showing that polynomial-size 3-SAT refutations exist whp 
once m > 

Strong refutation. In a notable paper from 2002, Feige [Fei02] made a fruitful connection be¬ 
tween the hardness of refuting random 3-SAT instances and the inapproximability of certain op¬ 
timization problems that are challenging to analyze by other means. This refers to certifying not 
only that a random instance X is unsatisfiable, but furthermore that Opt(X) < 1 — <5 for some con¬ 
stant <5 > 0. Here Opt(X) denotes the maximum fraction of simultaneously satisfiable constraints 
in I. Feige specifically introduced the following “R3SAT Hypothesis”; For all small (j > 0 and 
large c E N, there is no polynomial-time 6-refutation algorithm for random 3-SAT with m = cn. 
To stress-test Feige’s R3SAT Hypothesis, one may ask if the aforementioned refutation algorithms 
for fe-SAT can be improved to (5-refutation algorithms. Coja-Oghlan et al. [COGL04] showed that 
they can be in the case of A; = 3,4. Indeed, they gave algorithms for what is called strong refutation 
in these cases. Here strongly refuting A:-SAT refers to certifying that Opt(X) < 1 — 2“^ + o(l) (note 
that Opt(X) ~ 1 — 2~^ whp assuming m > 0{n)). 

Beyond A:-SAT. As in the algebraic and approximation theories of CSP, it’s of significant interest 
to consider random instances of the CSP(P) problem for general predicates P ^ {0,1}, besides 
just Boolean OR. (Though Boolean predicates are more familiar, larger domains are of interest, e.g., 
for g'-colorability of A:-uniform hypergraphs.) Specifically, we are interested in the question of how 
properties of P affect the number of constraints needed for efficient refutation of random CSP(P) 
instances. This precise question is very relevant for work in cryptography based on the candidate 
OWFs and PRGs of Goldreich [GolOO]; see also [ABWIO] and the survey of Applebaum [Appl3]. 
It has also proven essential for the recent exciting approach to hardness of learning due to Daniely, 
Linial, and Shalev-Shwartz [DLSS13,DLSS14,DSS14]. We discuss this learning connection and our 
results on it in more detail in Section 5. 

The special case of random 3-XOR has proved particularly important: it is related to 3-SAT 
refutation through Feige’s “3XOR Principle” (see [Fei02, FO05, FKO06]); it’s the basis for crypto¬ 
graphic schemes due to Alekhnovich [Ale03] (and is related to the “Learning Parities with Noise” 
problem); it’s used in the best known lower bounds for the SOS SDP hierarchy [Gri01,Sch08], which 
we discuss further in Section 6; and, Barak and Moitra [BM15] have shown it to be equivalent to 
a certain “tensor prediction problem” in learning theory. 

Prior to this work, very little was known about how the predicate P affects the complexity of 
refuting random GSP(P) instances. The main known result, following from the work Goja-Oghlan, 
Cooper, and Frieze [COCFIO], was the following: For any Boolean fe-ary predicate P, one can 
efficiently strongly refute random CSP(H) instances X (i.e., certify Opt(X) < E[P] -|-o(l)) provided 
the number of constraints m satisfies m > In the case of A:-XOR, the very recent work 

of Barak and Moitra [BM15] showed how to improve this bound to m > 0(n^/^).^ 

^The present authors also obtained this result around the same time, but we credit the result to [BM15] as they 
published earlier. With their permission we repeat the proof herein, partly because we need to prove a slightly more 
general variant. 
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CSP 

Poly-size 
refutations 
whp once 

m > ■ ■ ■ 

Strength 

Efficient / 
Existential 

Reference 

fc-SAT 


Refutation 

Existential 

[Fu96] 

fc-SAT 

0(n''-Viog'=-2(„)) 

Refutation 

Efficient 

[BKPS99, 

BKPS02] 

fc-SAT 


Refutation 

Efficient 

[GK01,EGK05] 

3-SAT 

Q(^3/2+e) 

Refutation 

Efficient 

[FG01,FGK05] 

fc-SAT 

Q(^fe/2+e) 

Refutation 

Efficient 

Claimed possible 
in [GJ02,GJ03] 

Exactly-/ci-out- 

of-fc-SAT 

0{n) 

Refutation 

Efficient 

[BB02] 

2-out-of-5-SAT 

Q(^3/2+6) 

Refutation 

Efficient 

[GJ02, GJ03] 

NAE-3-SAT 

0{n) 

U(l)-Refutation 

Efficient 

[KLP96, GJ02, 
GJ03] 

fc-SAT 

(9(j7,R/2|) 

Refutation 

Efficient 

[COGLS04, 

FO05] 

3-SAT 


Refutation 

Efficient 

[GL03] 

3-SAT 


Strong 

Efficient 

[COGL04, 

COGL07] 

4-SAT 

0{n^) 

Strong 

Efficient 

[COGL04, 

COGL07] 

3-SAT 


Refutation 

Efficient 

[FO04] 

3-SAT 


Refutation 

Existential 

[FKO06] 

3-XOR 


1/n'Hi). 

refutation 

Efficient 

Implicit 
in [FKO06] 

3-SAT 

j^(^3/2) 

Refutation 

Efficient 

Claimed 
in [FKO06] 

Boolean A:-CSP 

(9(j7,U/2|) 

Strong 

Efficient 

[COCFIO] 

fc-XOR 


Strong 

Efficient 

[BM15] (also 
herein) 

Any fc-CSP 


Quasirandom 
( =i> strong) 

Efficient 

This paper 

Any fc-CSP 
not supporting 
t-wise indep. 


Ufc(l)-refutation 

Efficient 

This paper 


Table 1: Up to logarithmic factors on m, our work subsumes all previously known results. 
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2 Our results and techniques 


Here we describe our main results and techniques at a high level. Precise theorem statements 
appear later in the work and the definitions of the terminology we use is given in Section 3. We 
also mention that in Section B we will generalize all of our results to the case of larger alphabets; 
but we’ll just discuss Boolean predicates P : {0,1}^ ^ {0,1} for simplicity. 

Our main result gives a (possibly sharp) bound on the number of constraints needed to refute 
random CSP(P) instances. Before getting to it, we first describe some more concrete results that 
go into the main proof. All of our results rely on a strong refutation algorithm for fe-XOR (actually, 
a slight generalization thereof). For m > such a result follows from [COCFIO]; however, 

the exponent \k/2\ can be improved to k/2. We give a demonstration of this fact herein; as 
mentioned earlier, it was published very recently by Barak and Moitra [BM15, Corollary 5.5 and 
Extensions]. 

Theorem 2.1. There is an efficient algorithm that (whp) strongly refutes random k-XOR instances 
with at least 0(n^/^) constraints; i.e., it certifies Opt(X) < ^ + o(l). 

The proof of Theorem 2.1 follows ideas from [COGL07] and earlier works on “discrepancy” 
of random fe-SAT instances. The case of even k is notably easier, and we present two “folklore” 
arguments for it. The case of odd k is trickier. Roughly speaking we view the instance as a 
homogeneous degree-fe multilinear polynomial, which we want to certify takes on only small values 
on inputs in { — 1,1}"'. Considering separately the contributions based on the “last” of the k 
variables in each constraint, and then using Cauchy-Schwarz, it suffices to bound the norm of a 
carefully designed quadratic form of dimension n^~^, indexed by tuples of k — 1 variables. This 
is done using the trace method [Wig55, FK81]. Similar techniques, including the use of the trace 
method, date back to the 2001 Friedgman-Goerdt work [FGOl] refuting random 3-SAT given 
m = constraints. 

With Theorem 2.1 in hand, the next step is certifying quasirandomness of random k-ary CSP 
instances having m > 0(n^/^) constraints. Roughly speaking we say that a GSP instance is 
quasirandom if, for every assignment x € {0,1}", the m induced fe-tuples of literal values are close 
to being uniformly distributed over {0,1}^. (Note that this is only a property of the instances’ 
constraint scopes/negations, and has nothing to do with P.) Since the “Vazirani XOR Lemma” 
implies that a distribution on {—1,1}^ is uniform if and only if all its 2^ XORs are have bias we 
are able to leverage Theorem 2.1 to prove: 

Theorem 2.2. There is an efficient algorithm that (whp) certifies that a random instance of 
CSP(P) is quasirandom, provided the number of constraints is at least 0(n^/^). 

If an instance is quasirandom, then no solution can be much better than a randomly chosen 
one. Thus by certifying quasirandomness we are able to strongly refute random instances of any 
CSP(P): 

Theorem 2.3. For any k-ary predicate P, there is an efficient algorithm that (whp) strongly refutes 
random GSP(P) instances when the number of constraints is at least 0(n^/^). 

In particular, this theorem improves upon [COGFIO] by a factor of whenever k is odd; this 
savings is new even in the well-studied case of fc-SAT. 

The above result does not make use of any properties of the predicate P other than its arity, k. 
We now come to our main result, which shows that for many interesting P, random CSP(P) 
instances can be refuted with many fewer constraints than In the following theorem, the 
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phrase “i-wise uniform” (often imprecisely called “f-wise independent”) refers to a distribution on 
{ —1,1}^ in which all marginal distributions on t out of k coordinates are uniform. 

Theorem 2.4. (Main.) Let P be a k-ary predicate such that there is no t-wise uniform distribution 
supported on its satisfying assignments, t > 2. Then there is an efficient algorithm that (whp) 
klk{l)-refutes random instances o/CSP(P) with at least 0{n^^‘^) constraints. 

We remark that property of a predicate P supporting a pairwise uniform distribution has played 
an important role in approximability theory for CSPs, ever since Austrin and Mossel [AM09] showed 
that such predicates are hereditarily approximation-resistant under the UGC. Also, note that the 
largest t for which a predicate P supports a t-wise uniform distribution determines the minimum 
number of constraints required by our algorithm. This value is closely related to the notion of 
distribution complexity studied by Feldman, Perkins, and Vempala [FPV14,FPV15] in the context 
of planted random CSPs. Informally, the distribution complexity of a planted CSP is the largest 
t for which the distribution over constraint inputs {—1,1}^ induced by the planted assignment is 
t-wise uniform. Despite this similarity, the algorithmic techniques used by Feldman, Perkins, and 
Vempala in the planted case [FPV14] do not seem to directly apply to refutation. 

The idea behind the proof of Theorem 2.4 is that with constraints we can use the 

algorithm of Theorem 2.2 to certify quasirandomness (closeness to uniformity) for all subsets of t 
out of k coordinates. Thus for every assignment x G {0,1}”, the induced distribution on constraint 
fe-tuples is (o(l)-close to) t-wise uniform. Since P does not support a t-wise uniform distribution, 
this essentially shows that no x can induce a fully-satisfying distribution on constraint inputs. To 
handle the o(l)-closeness caveat, we show that if P does not support a t-wise uniform distribution, 
then it is d-far from supporting such a distribution, for 5 = 12^(1). The algorithm can then in fact 
{6 — o(l))-refute random CSP(P) instances. 

Example 2.5. To briefly illustrate the result, consider the Exactly-A:-out-of-2A;-SAT CSP, studied 
previously in [BB02,CJ03]. The associated predicate supports a 1-wise uniform distribution, namely 
the uniform distribution over strings in {0,1}^^ of Hamming weight k. However, it is not hard 
to show that it does not support any pairwise uniform distribution. As a consequence, random 
instances of this CSP can be refuted with only 0(n) constraints, independent of k. 

2.1 An application from learning theory 

Recently, an exciting approach to proving hardness-of-learning results has been developed by 
Daniely, Linial, and Shalev-Shwartz [DLSS13, DLSS14, DSS14, Danl5]. The most general results 
appear in [DLSS14]. In this work, Daniely et al. prove computational hardness of several cen¬ 
tral learning theory problems, based on two assumptions concerning the hardness of random CSP 
refutation. While the assumptions made in [DSS14, Danl5] appear to be plausible, our work un¬ 
fortunately shows that the more general assumptions made in [DLSS14] are false. 

Below we state the (admittedly strong) assumptions from [DLSS14] (up to some very minor 
technical details which are discussed and treated in Section 5). We will need one piece of terminol¬ 
ogy: the -variability VARo(F’) of a predicate P : {—1,1}*^ —)■ {0,1} is the least c such that there 
exists a restriction to some c input coordinates forcing P to be 0. Essentially, the assumptions 
state that one can obtain hardness-of-refutation with an arbitrarily large polynomial number of 
constraints by using a family of predicates (P^) that: a) have unbounded O-variability; b) support 
pairwise uniformity. However, our work shows that supporting t-wise uniformity for unbounded t 
is also necessary. 
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SRCSP Assumption 1. ( [DLSS14]-) For all d £ ¥\ there is a large enough C such that the 
following holds: If P : {—1,1}^ —>■ {0,1} has VARq{P) > C and is hereditarily approximation 
resistant on satisfiable instances, then there is no polynomial-time algorithm refuting (whp) random 
instances of CSP{P) with m = n'^ constraints. 

SRCSP Assumption 2. ([DLSSI 4 ], generalizing the “RCSP Assumption” of [BKS13] to super- 
linearly many constraints.) For all d G ¥1 there is a large enough C such that the following holds: 
If P : (—1,1}^ —>■ {0,1} has VARq{P) > C and is 6-close to supporting a pairwise uniform distri¬ 
bution, then for all e > 0 there is no polynomial-time algorithm that (5 + e)-refutes (whp) random 
instances of CSP{P) with m = constraints. 

In [DLSS14] it is shown how to obtain three very notable hardness-of-learning results from 
these assumptions. However as stated, our work falsifies the SRCSP Assumptions. Indeed, the 
assumptions are false even in the three specific cases used by [DLSS14] to obtain hardness-of- 
learning results. We now describe these cases. 

Case 1. The Huang predicates (Hk) are arity-0(K^) predicates introduced in [Hual3]; they are 
hereditarily approximation resistant on satisfiable instances and have 0-variability In [DLSS14] 

they are used in SRCSP Assumption 1 to deduce hardness of PAC-learning DNFs with w(l) terms. 
However: 

Theorem 2.6. For all n >9, the predicate does not support a A-wise uniform distribution. 

Thus by Theorem 2.4 we can efficiently refute random instances of CSP(Rk) with just O^nf) 
constraints, independent of k. This contradicts SRCSP Assumption 1. 

Case 2. The majority predicate Maj^, has 0-variability \k/‘2\ and is shown in [DLSS14] to be 
^j^-far from supporting a pairwise uniform distribution. In [DLSS14] these predicates are used in 
SRCSP Assumption 2 to deduce hardness of agnsotically learning Boolean halfspaces to within any 
constant factor. However: 

Theorem 2.7. For odd k > 25, the predicate Maj;. does not support a A-wise uniform distribution; 
in fact, it is .1-far from supporting a A-wise uniform distribution. 

Theorem 2.4 then implies we can efficiently (5-refute random instances of CSP(Maj;j) with 0{n‘^) 
constraints, where 5 = .1 S> This contradicts SRCSP Assumption 2. 

Case 3. Finally, we also prove that SRCSP Assumption 1 is false for another family of predi¬ 
cates (Tfc) used by [DLSS14] to show hardness of PAC-learning intersections of 4 Boolean halfspaces. 

Our results described in these three cases all use linear programming duality. Specifically, in 
Lemma 3.16 we show that P is <5-far from supporting a t-wise uniform distribution if and only 
if there exists a fc-variable multilinear polynomial Q that satisfies certain properties involving P 
and 6. We then explicitly construct these dual polynomials for the Huang, Majority, and pred¬ 
icates. 

We conclude this section by emphasizing the importance of the Daniely-Linial-Shalev-Shwartz 
hardness-of-learning program, despite the above results. Indeed, subsequently to [DLSS14], Daniely 
and Shalev-Shwartz [DSS14] showed hardness of improperly learning DNF formulas with a;(logre) 
terms under a much weaker assumption than SRCSP Assumption 1. Specifically, their work only 
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assumes that for all d there is a large enough k such that refuting random fc-SAT instances is hard 
when there are m = constraints. This assumption looks quite plausible to us, and may even 
be true with k not much larger than 2d. Most recently, Daniely showed hardness of approximately 
agnostically learning halfspaces using the XOR predicate rather than majority [Danl5]. This 
result shows that there is no efficient algorithm that agnostically learns halfspaces to within a 
constant approximation ratio under the assumption that refuting random A:-XOR instances is hard 
when m = c > 0. He also shows hardness of learning halfspaces to within an 

approximation factor of " for any i/ > 0 assuming that there exists some constant c > 0 such 

that for all s, refuting random A:-XOR instances with k = log^ n is hard when m = n'^^. 

2.2 Evidence for the optimality of our results 

It’s natural to ask whether the dependence in our main Theorem 2.4 can be improved. As 
we don’t expect to prove unconditional hardness results, we instead merely seek good evidence 
that refuting (t — l)-wise supporting predicates P is hard when m <C A natural form of 

evidence would be showing that various strong classes of polynomial-time refutation algorithms fail 
when m <C To make sense of this we need to talk about the form of such algorithms; i.e., 

propositional proof systems. 

Recently, there has been significant study of the “SOS” (Sum-Of-Squares) proof system, intro¬ 
duced by Grigoriev and Vorobjov [GVOl]; see, e.g., [OZ13,BS14] for discussion. It has the following 
virtues: a) it is very powerful, being able to efficiently simulate other proof systems (e.g.. Reso¬ 
lution, Lovasz-Schrijver); b) it is automatizable [LasOO, ParOO], meaning that n-variable “degree-d 
SOS proofs” can be found in time whenever they exist; c) we do know some examples of lower 
bounds for degree-d SOS proofs. In Section 6 of this paper we show the following: 

Theorem 2.8. All of our refutation algorithms for k-ary predicates can be extended to produce 
degree-2k SOS proofs. 

We now return to the question of evidence for the optimality of constraint density used in 
our results. Dating back to Franco-Paull [FP83] and Ghvatal-Szemeredi [GS88], there has been a 
long line of work in proof complexity showing lower bounds for refuting random 3-SAT instances, 
especially in the Resolution proof system. This culminated in the work of Ben-Sasson and Wigder- 
son [BSW99], which showed that for random 3-SAT (and 3-XOR) with m = Resolution 

refutations require size (whp). More recently, Schoenebeck [Sch08] showed (using the expan¬ 
sion analysis of [BSW99]) that random A;-XOR and A:-SAT instances with m < require SOS 

proofs of degree and hence take time to refute by the “SOS Method”. See [Tul09,Ghal3] 
for related larger-alphabet followups. These results show that the Barak-Moitra bound for 

refuting random A:-XOR (which also works in 0(A:)-degree SOS) and our bound for random A;-SAT 
are tight (up to subpolynomial factors) within the SOS framework. Given the power of the SOS 
framework, this arguably constitutes some reasonable evidence that no polynomial-time algorithm 
can refute random A:-SAT instances with m <C 

We now discuss our main theorem’s bound for predicates P not supporting t-wise uniform 
distributions. Suppose P is a predicate that does support a (t — l)-wise uniform distribution, 
where t > 2. In the context of inapproximability and SDP-hierarchy integrality gaps, this condition 
on P has been significantly studied in the case of t = 3. For P supporting pairwise uniformity, 
it is known [BGMT12, TW13] that the Sherali-Adams and Lovasz-Schrijver'*' SDP hierarchies 
require degree D(n) to refute random CSP(P) instances (whp) when m = 0{n). This result was 
also recently proven for the stronger SOS proof system by Barak, Chan, and Kothari [BCK15], 
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except that the CSP(P) instances are not quite uniformly random; they are “slightly pruned” 
random instances. For any t > 2, the second and third authors recently essentially showed [OW14] 
that for the Sherali-Adams"*" SDP hierarchy, degree is (whp) necessary to refute random 

CSP(P) instances when m < As a caveat, again the instances are slightly pruned random 

instances, rather than being purely uniformly random. (The instances in [OW14] are also in 
the “Goldreich [GolOO] style”; i.e., there are no literals, but the “right-hand sides” are random. 
However it is not hard to show the proofs in [OW14] go through in the standard random model 
of this paper.) Future work [MWW15] is devoted to removing the pruning in these instances. 
Although the Sherali-Adams'*' SDP hierarchy is certainly weaker than the SOS hierarchy, these 
works still constitute some evidence that our main theorem’s requirement of m > 0(n*/^) for 
non-t-wise supporting predicates may be essentially optimal. 

Further evidence for the optimality of our results is provided by the work of Feldman, Perkins, 
and Vempala on statistical algorithms for random planted CSPs [FPV15]. They show that their 
lower bounds against statistical algorithms for solving random planted GSPs also imply lower 
bounds against statistical algorithms for refuting uniformly random GSPs. Specifically, they prove 
that when P supports a (t — l)-wise uniform distribution, any statistical algorithm using queries 
that take possible values can only refute random instances of GSP(P) with at least 

constraints. As an application of this result, they also show that any convex program refuting such 
an instance of CSP(P) must have dimension at least 

2.3 Certifying independence number and chromatic number of random hyper¬ 
graphs 

Coja-Oghlan, Goerdt, and Lanka [GOGL07] also use their CSP refutation techniques to certify 
that random 3- and 4-uniform hypergraphs have small independence number and large chromatic 
number. We extend these results to random /c-uniform hypergraphs. 

Theorem 2.9. For a random k-uniform hypergraph H, there is a polynomial time algorithm cer¬ 
tifying that the independence number of H is at most (3 with high probability when H has at least 
O hyperedges. 

Theorem 2.10. For a random k-uniform hypergraph H, there is a polynomial time algorithm 
certifying that the chromatic number of H is at least f with high probability when H has at least 
O hyperedges. 

The proofs of these theorems follow the outline of [GOGL07]. We show Theorem 2.9 using a 
slightly more general form of Theorem 2.1. Theorem 2.10 follows almost directly from Theorem 2.9 
using the fact that every color class of a valid coloring is an independent set. Details are given in 
Appendix C. 

3 Preliminaries and notation 

3.1 Constraint satisfaction problems 

We review some basic definitions and facts related to constraint satisfaction problems (GSPs). In 
this section we discuss only the Boolean domain, which we prefer to write as {—1,1} rather than 
{0,1}. The straightforward extensions to larger domains appear in Section B. We will need the 
following notation: For x G R"' and S C [n] we write xs G Ill’ll for the restriction of x to coordinates 
S; i.e., {xi)i^s- We also use o to denote the entrywise product for vectors. 



Definition 3.1. Given a predicate P : {—1,1}^ {0)1}) instance I of the CSP(P) problem 

over variables xi,... ,Xn is a multiset of P-constraints. Each P-constraint consists of a pair (c, S), 
where S G [n]^ is the scope and c G {—1,1}^ is the negation pattern; this represents the constraint 
P(c o xs) = 1- We typically write m = \X\. Let Vali(x) be the fraction of constraints satisfied 
by assignment x G { — 1,1}"', i.e., Vali(x) = ^ X}(c 5)ex o 2 ^ 5 ). The objective is to find an 
assignment x maximizing Vali(x). The optimum ofP, denoted by Opt(P), is Vali(x). 

If Opt(P) = 1, we say that Z is satisfiable. We also write P for the quantity {- 1 , 1 }* [-P(^)]; i-e-) 
the fraction of assignments that satisfy P. For any instance X in which each constraint involves k 
different variables, we have Opt(P) > P.® 

We next define a standard random model for CSPs. For P : {—1,1}^ —)• {0,1}, let Pp{n,p) 
be the distribution over CSP instances given by including each of the possible constraints 
independently with probability p. Note that we may include constraints on different permutations 
of the same set of variables, constraints on the same tuple of variables with different negations c, 
and constraints with the same variable occurring as more than one argument. It is reasonable to 
include such constraints in the case that the predicate P is not a symmetric function. We use m to 
denote the expected number of constraints, namely 2^n^p. As noted in Fact 3.6 below, the number 
of constraints m in a draw from Pp{n,p) is very tightly concentrated around m, and we often blur 
the distinction between these parameters. Appendix D explicitly describes a method for simulating 
an instance drawn from Pp{n,p) when the number of constraints is fixed. 

Quasirandomness. We now introduce an important notion for this paper: that of a CSP instance 
being quasirandom. Versions of this notion originate in the works of Goerdt and Lanka [GL03] 
(under the name “discrepancy”), Khot [Kho06] (“quasi-randomness”), Austrin and Hastad [AH13] 
(“adaptive uselessness”), and Chan [Chal3] (“low correlation”), among other places. To define it, 
we first need to define the induced distribution of an instance and an assignment. 

Definition 3.2. Given a CSP instance X and and an assignment x G {—1,1}", the induced dis¬ 
tribution, denoted Vp^x, is the probability distribution on {—1, 1 }^ where the probability mass on 
a G { — 1,1}*^ is given by Vx^x{ot) = • #{(c, 5) G P | C 0 X 5 = a}. In other words, it is the empirical 

distribution on inputs to P generated by the constraint scopes/negations on assignment x. Note 
that the predicate P itself is irrelevant to this notion. We will drop the subscript X when it is clear 
from the context. We define Dx,x = 2^ • Px,x to be the density function associated with Px,x- 

We can now define quasirandomness. 

Definition 3.3. A CSP instance X is e-quasirandom if Px,x is e-close to the uniform distribution 
for all X G {—1,1}"; i.e., if dTv(^i,x) U^) < e for all x G {—1,1}". 

Here we use the notation for the uniform distribution on {—1,1}^ as well as the following: 

Definition 3.4. If P and P' are probability distributions on the same finite set A then d^yiP, P') 
denotes their total variation distance, ^ l^(®) dx\{P,P') < e we say that P 

and P' are e-close. If dxY{P,P') > e we say they are e-far. (As neither inequality is strict, these 
notions are not quite opposites.) 

An immediate consequence of an instance being quasirandom is that its optimum is close to P: 

^Technically, our definitions allow constraints with a variable appearing more than once, so Optfl) > P doesn’t 
always hold for us. But since we only consider random T, we’ll in fact have Opt(X) « P whp over X anyway. 
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Fact 3.5. If I is e-quasirandom, then Opt(X) < P + e (and in fact, |Opt(X) — P| < e). 

We conclude the discussion of CSPs by recording some facts that are proven easily with the 
Chernoff bound: 

Fact 3.6. LetX ~ Pp{n,p). Then the following statements hold with high probability. 

1 . m = 

2. opt(i)<p.(i+o(yr:i)). 

3. I is O ^^2^ • -quasirandom. 

3.2 Algorithms and refutations on random CSPs 

Definition 3.7. Let P be a Boolean predicate. We say that A is 6-refutation algorithm for random 
CSP(P) with m constraints if A has the following properties. First, on all instances T the output 
of A is either the statement “Opt(X) < 1 — (5” or is “fail”. Second, A is never allowed to err, where 
erring means outputting “Opt(X) < 1 — <5” on an instance which actually has Opt(X) >1 — 5. 
Finally, A must satisfy 


Pr [AiT) = “fail”] < o(l) (as n —oo), 

X^Tp{n,pf 

where p is defined by m = 2^n^p. Although A is often a deterministic algorithm, we do allow it to 
be randomized, in which case the above probability is also over the “internal random coins” of A. 

We refer to this notion as weak refutation, or simply refutation, when the certification statement 
is of the form “Opt(X) < 1” (equivalently, when 6 = 1/|X|). We refer to the notion as strong 
refutation when the statement is of the form “Opt(X) < P + o(l)” (equivalently, when 5 = 1 — P + 
0 ( 1 )). 

Remark 3.8. In Section 5 we will encounter a “two-sided error” variant of this definition. This 
is the slightly easier algorithmic task in which we relax the condition on erring: it is only required 
that for each instance T with Opt(X) > 1 — 5, it holds that Pr[>l(X) = “Opt(X) < 1 — 5”] < 1/4, 
where the probability is just over the random coins of A. 

Remark 3.9. We will also use the analogous definition for certification of related properties; e.g., 
we will discuss e-quasirandomness eertifieation algorithms in which the statement “Opt(X) < 1 — 5” 
is replaced by the statement “X is e-quasirandom”. 

3.3 t-wise uniformity 

An important notion for this paper is that of t-wise uniformity. Recall: 

Definition 3.10. Probability distribution V on {—1,1}^ is said to be t-wise uniform, 1 < t < k, ii 
for all S C [k] with ISI = t the random variable xs is uniform on {—1,1}* when x ~ P. (We remark 
that this condition is sometimes inaccurately called “t-wise independence” in the literature.) 

We will also consider the more general notion of (e, t)-wise uniformity. This is typically defined 
using Fourier eoeffieients: 
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Definition 3.11. Probability distribution P on { — 1,1}^ is said to be {e,t)-wise uniform if \D{S)\ < 
e for all S C [/c] with 0 < |5| < t, where D = 2^ ■ T> is the probability density associated with 
distribution D. 

Here we are using standard notation from Fourier analysis of Boolean functions [0’D14]. In 
particular, for any / : {—1,1}^ —IR we write f{x) = Ylsc[k] expansion as a 

multilinear polynomial over R, with denoting flies confused with the projection 

xs G 

Remark 3.12. It is a simple fact (and it follows from Lemma 3.13 below) that (0, t)-wise uniformity 
is equivalent to t-wise uniformity. 

Also important for us is a related but distinct notion, that of being e-close to a t-wise uniform 
distribution. It’s easy to show that if D is e-close to a t-wise uniform distribution then P is (2e, t)- 
wise uniform. In the other direction, we have the following (see also [AAK“''07] for some quantitative 
improvement): 

Lemma 3.13. (Alon-Goldreich-Mansour [AGM03, Theorem 2.1]). If D is an {e,t)-wise uniform 
distribution on {—1,1}^, then there exists a t-wise uniform distribution D' on {—1,1}^ with 

t 

d^y{V,V')<{^ (^)) 

i=l 

In particular if t = k we have the bound 2^ • e (and this can also be improved [Golll] to • e). 

Finally, we make a crucial definition: 

Definition 3.14. A predicate P : {—1,1}*’ —>■ {0,1} is said to be t-wise supporting if there is 
a t-wise uniform distribution P whose support is contained in P“^(l). We say P is 5-far from 
t-wise supporting if every t-wise uniform distribution P is (5-far from being supported on P; i.e., 
has probability mass at least 6 on P“^(0). 

3.4 A dual characterization of limited uniformity 

It is known that the condition of P supporting a t-wise uniform distribution is equivalent to the 
feasibility of a certain linear program; hence one can show that P is not t-wise supporting by 
exhibiting a certain dual object, namely a polynomial. This appears, e.g., in work of Austrin and 
Hastad [AH09, Theorem 3.1]. Herein we will extend this fact by giving a dual characterization of 
being far from t-wise supporting. 

Definition 3.15. Let 0 < <5 < 1. For a multilinear polynomial Q : {—1,1}*’ —>• R, we say that Q 
5-separates P : {—1,1}*’ {0,1} if the following conditions hold: 

• Q{z) >5 — 1 Vz G {—1,1}*’; 

• Q{z)>5 V^GP-^l); 

• <5(0) = 0, i.e., Q has no constant coefficient. 

We now provide the quantitative version of the aforementioned [AH09, Theorem 3.1]: 

Lemma 3.16. Let P : {—1,1}*’ —>• {0,1} and letO < 5 < 1. Then P is 5-far from t-wise supporting 
if and only if there is a 5-separating polynomial for P of degree at most t. 
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Proof. The proof is an application of linear programming duality. Consider the following LP, which 
has variables Piz) for each z G {—1,1}^- 


minimize 

Y,i^-P{z))V{z 


( 1 ) 





s.t. 

= 2^ -ViS) = 0 

ys <Z[k] 0 < | 5 | < t 

( 2 ) 


E 1 


(3) 


zel-i,!}'' 




Piz) > 0 

VzG {- 1 , 1 }^ 



Constraint (3) and the nonnegativity constraint ensure that P is a probability distribution on {—1,1}^ 
Constraint (2) expresses that P is t-wise uniform (see Remark 3.12). The objective (1) is minimiz¬ 
ing the probability mass that P puts on assignments in P“^(0). Thus the optimal value of the LP 
is equal to the smallest 7 such that P is 7 -close to t-wise supporting; equivalently, the largest 7 
such that P is 7 -far from t-wise supporting. 

The following is the dual of the above LP. It has a variable c{S) for each 0 < [S'! < t as well as 
a variable ^ corresponding to constraint (3). 


maximize ^ (4) 

s.t. ^ c{S)z^ < 1 — P{z) — Vz G {—1,1}*^. (5) 

sc[k] 

o<\s\<t 

Observe that a feasible solution ({c( 5 )} 5 ,^) is precisely equivalent to a multilinear polynomial Q 
of degree at most t, namely Q{z) = — c{S)z^, that ^-separates P. 

Thus P is 5-far from t-wise supporting if and only if the LP’s objective (1) is at least 5, if and 
only if the dual’s objective (4) is at least 6, if and only if there is a d-separating polynomial for P 
of degree at most t. □ 

From this proof we can also derive that if P fails to be t-wise supporting then it must in fact 
be Pfc(l)-far from being t-wise supporting: 

Corollary 3.17. Suppose P : {—1,1}^ —)• {0,1} is not t-wise supporting. Then it is in fact 5-far 
from t-wise supporting for 6 = (or 6 = 2 “^^^ ^ when t = k). 

Proof. Let K = 1 -\- (t) the number of variables in the dual LP from Lemma 3.16, so 

K < -\-l in general, with K <2^ when t = k. By assumption, the objective (4) of the dual LP’s 

optimal solution is strictly positive. This optimum occurs at a vertex, which is the solution of a 
linear system given by a iL x iL matrix of ±1 entries and a “right-hand side” vector with 0,1 entries. 
By Cramer’s rule, the solution’s entries are ratios of determinants of integer matrices with entries 
in { — 1,0,1}. Thus any strictly positive entry is at least 1/N, where N is the maximum possible 
such determinant. By Hadamard’s inequality, N = and the claimed result follows. □ 
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4 Quasirandomness and its implications for refutation 


4.1 Strong refutation of /c-XOR 

In this section, we state our result on strong refutation of random /c-XOR instances with m = 
O constraints. (Recall that essentially this result was very recently obtained by Barak 

and Moitra [BM15].) We actually have a slightly more general result, allowing variables and 
coefficients to take values in [—1,1] and not just in {—1,1}. We will use this additional freedom 
to prove refutation results for CSPs over larger alphabets in Appendix B and refutation results for 
independence number and chromatic number of random hypergraphs in Appendix C. 

Theorem 4.1. For k > 2 and p > let be independent random variables such 

that for each T G [n]^ , 

E[w{T)] = 0 ( 6 ) 

Pr[w{T) / 0] < p (7) 

hmi < 1. (8) 

Then there is an efficient algorithm certifying that 

w{T)x'^ < 2°(^)v^n3^/^log3/2^. 

T€.[n]^ 

for all X G IR” such that ||x||oo < 1 with high probability. 

In this form, the theorem is not really about CSP refutation at all. It says that the value of a 
polynomial with random coefficients is close to its expectation when its inputs are bounded. 

We give the proof in Appendix A. It follows techniques from [COGL07] fairly closely and is 
essentially the same as the proof of [BM15]. We will use this theorem to prove our results in 
subsequent sections. 

We obtain strong refutation of A:-XOR as a simple corollary. 

Corollary 4.2. For k >2, let I ^ Xfc_xoR(^)P)- Then, with high probability, there is a degree-2k 
SOS proof that Opt(X) < ^ + 7 when m > ^ ^ ^ • 

Proof. We can write the /c-XOR predicate as 


so for a /c-XOR instance I ■ 


fc-XOR(z) = 
J^k-xoRin,p), 




{{T,c)eX}X 




'ik—1 


T£[n]^ cE{dil}^ 

1 ; 


i£[k] 


m 


Y w{T)x'^, 
Te[n]'= 


where w{T) = —2 ^ ^{(Rcjex} Ojelfc] rc(T)’s are random variables depending on 

the choice of X; observe that E[rc(T)] = 0, Pr[t(;(r) / 0] < 2^p, and |rc(T)| < 1 for all T G [n]^. 
By Theorem 4.1, there is an algorithm certifying that 

Opt X < - +- ^ -2-. 

2 m 

with high probability when p > . Since m = (1 -|- o(l))m with high probability, choosing 

— ^ 2 °^ log n gjygg desired result. □ 

As an example, we can choose 7 = and certify that Opt(X) < ^ -|-o(l) when m = 
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4.2 Quasirandomness and strong refutation of any k-CSP 

Next, we will use the algorithm of Theorem 4.1 to certify that an instance of CSP(P) is quasirandom. 
This will immediately give us a strong refutation algorithm. 

In order to certify quasirandomness, Lemma 3.13 implies that it suffices to certify each Fourier 
coefficient of Dx^x has small magnitude. 

Lemma 4.3. Let 0 / S' C [A:] with |S| = s. There is an algorithm that, with high probability, 
certifies that 

2 ‘^(^) max{n®/^, ^/n} log^^^ n 
^ 1/2 

for all X G {—1,1}"', assuming also that m > max{n®/^,n}. 

To prove this lemma, we need another lemma certifying that polynomials whose coefficients are 
sums of 0 -mean random variables have small value. 


Di,x{S) 


Lemma 4.4. Let S C [k] with |S| = s > 0. Let r G N and let independent 

random variables satisfying conditions (6), (7), and (8) for some p > Then there is an 

algorithm that certifies with high probability that 


< 

U&[nY i=l 


2^^^'ly/Tp ■ log^''^^ n 

4 maxjyTp, 1} • n log n 


if s >2 
ifs = l. 


for all X G R” such that ||x||oo < 1. 

The proof is straightforward and we defer it to Section 4.4. 


Proof of Lemma 4-3. Without loss of generality, assume 1 G S. Applying definitions, we see that 

Dx,xiS) = E[z^] = — ^{iT,c)eX}C^= — Y Y Y ws{T,(f). 

' U&[nY T&[nY c&{±lfi Ue[nY Te[n]''c'sfil}''-! 

Ts=U Ts=U 

(9) 

where we define ws{T, d') = l{('r,(-i,c'))ex}(c0‘^'^^^^ recall that Tg is the 

projection of T onto the the coordinates in S. It is clear that E[t(; 5 (T, c')] = 0, Pr[t(; 5 (r, c') 

0] < p, and |rc 5 (T,c')| < 1. There are r = terms in each sum of ws{T,dys and we 

can apply Lemma 4.4. When s = 2, we plug in these values and see that we can certify that 
Dx,x{S) < ^ ^ ^ When s = 1, m > n implies that rp > ^ and we can certify that 

Dx,x{S) < ^ ^ ^ ■ The lower bound can be proved in exactly the same way by considering the 

random variables—ri; 5 (T, c'). □ 


The existence of an algorithm for certifying quasirandomness follows from Lemmas 3.13 and 4.3. 


Theorem 4.5. There is an efficient algorithm that certifies that an instance X Fp{n,p) of 

nO(k) k /2 1 5 

CSP(P) is Y-Quasirandom with high probability when m > - "'^‘2 . 

Since y-quasirandomess implies that Opt(X) < P + 7 , this algorithm also strongly refutes 
CSP(P). 


Theorem 4.6. There is an efficient algorithm that, given an instance X ^ 
certifies that Opt(X) < P + 7 with high probability when m > - - '^^2 


Fp{n,p) o/CSP(P), 
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4.3 (e, t)-quasirandomness and f2(l)-refutation of non-t-wise-supporting CSPs 

In the case that a predicate is not t-wise supporting, a weaker notion of quasirandomness suffices 
to obtain II(l)-refutation. 


Definition 4.7. An instance X of CSP(P) is (e, t)-quasirandom if TXx^x is (e,t)-wise uniform for 
every x G {— 1 , 1 }"'- 

Fact 3.6 shows that random instances with 0{n) constraints are (o(l), t)-quasirandom for all 
t < k with high probability. Lemma 4.3 directly implies that we can certify (e, t)-quasirandomness 
when m > 


Theorem 4.8. There is an efficient algorithm that certifies that an instance X Tp{n,p) of 
CSP(P) is ffi,t)-quasirandom with high probability when m > ^ ^ ^ o-nd t>2. 

We now reach the main result of this section, which states that if a predicate is d-far from t-wise 
supporting, then we can almost d-refute instances of CSP(P). 


Theorem 4.9. Let P be 5-far from t-wise supporting. Then there is an efficient algorithm that, 
given an instance X ~ Pp{n,p) of CSP(P), certifies that Opt(X) < 1 — (5 + 7 with high probability 

when m > ^ and t > 2 . 

“7 ~ 

We give two proofs of this theorem. In Proof 1, the theorem follows directly from certification 
of ( 7 , t)-quasirandomness and Lemma 3.13. 

Proof 1. Run the algorithm of Theorem 4.8 to certify that X is ( 7 /fe*, t)-quasirandom with high 
probability. By definition, we have certified that Px,x is ( 7 /A:*, t)-wise uniform for all x G {—1,1}”,- 
Lemma 3.13 then implies that for all x there exists a f-wise uniform distribution such that 
dx\fiVx,x-,P'x) < 7 - Now define Pgat to be an arbitrary distribution over satisfying assignments to 
P. Since P is 4-far from being t-wise supporting, we know that (iTv(P^Psat) > d for any t-wise 
uniform distribution V. The triangle inequality then implies that dTv{Px,x,X>sat) > 4 — 7 for all 
X G {—1,1}" and the theorem follows. □ 


Proof 2 gives a slightly weaker version of Theorem 4.9, requiring the stronger assumption that 
— ^ 2 ( log -g Q ]2 the dual polynomial characterization of being d-far from t-wise 

supporting. While perhaps less intuitive than Proof 1, Proof 2 is more direct. It only uses the 
XOR refutation algorithm and bypasses [AGM03]’s connection between (e,t)-wise uniformity and 
e-closeness to a t-wise uniform distribution. We were able to convert Proof 2 into an SOS proof 
(see Section 6.4), but we did not see how to translate Proof 1 into an SOS version. Proof 2 requires 
Plancherel’s Theorem, a fundamental result in Fourier analysis. 

Theorem 4.10 (Plancherel’s Theorem). For any f,g : {—1, 1}^ — R, 


EAHz)s{z)]= Yi f{S)d{S). 


Proof 2. Since P is <5-far from t-wise supporting, there exists a degree-t polynomial Q that 6- 
separates P. The definition of 4-separating implies that P{z) — (1 — 4) < Q{z) for all 2 : G {—1,1}*’. 
Summing over all constraints, we get that for all x G {—1,1}*^, 


E E ^{{T,c)&i}PixT o c) - m(I -6) < E E l{(T,c)eX}Q(a:T o c), 

TG[n]^ cE{dil}^ TE[n]^ cE{ibl}^ 
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or, equivalently, Valx(x) — (1 — (5) < 

It then remains to certify that ^[Q{z)] < 7. Observe that 


J [Q(^)] 


E |Oi,42)(3(*)1= V S^(S)Q(S), 

«4cw 


where the second equality follows from Plancherel’s Theorem. Since Q > —1 and E[Q] = 0, 
Q < 2^ and hence |Q(5')| < 2^ for all S. To finish the proof, we apply Lemma 4.3 to certify that 
< ^ for all S. □ 

With Corollary 3.17, Theorem 4.9 implies that we can nfc(l)-refute instances of CSP(P) with 
constraints when P is not t-wise supporting. 

Corollary 4.11. Let P be a predicate that does not support any t-wise uniform distribution. Then 
there is an efficient algorithm that, given an instance X ~ Pp{n,p) of CSP(P), certifies that 
Opt(X) < 1 — with high probability when m > 2C5(A:*)j^C 2 [QgS t > 2. 


DxAS) 


4.4 Proof of Lemma 4.4 

Recall the statement of the lemma. 


Lemma 4.4. Let S C [k\ with [S'! = s > 0. Let r G N and let {rc; 7 (i)}{/g[„]s jgj.,-] be independent 
random variables satisfying conditions (6), (7), and (8) for some p > • Then there is an 

algorithm that certifies with high probability that 


< 

Ueln]‘ j=l 


20{s)^,^3s/i log 5 / 2 ^ 

4 maxjyTp, 1} • n log n 


if s >2 
if s = l. 


for all X G R” such that ||x||gQ < 1. 

The proof uses Bernstein’s Inequality. 


Theorem 4.12 (Bernstein’s Inequality). Let Xi,... ,Xm be independent 0-mean random variables 
such that \Xi\ < B. Then, for a > 0, 


Pr 


■ M 

E 

, 2=1 


Xi > a 


1 ^2 


< exp 


zf=inxn + iBa^ 


Proof of Lemma 4-4- First, we define 


T 

vu = '^wuii). 

i=i 

Observe that the vjj’s are independent and that each one is the sum of r mean-0, i.i.d. random 
variables with magnitude at most 1. Noting that E rp, we can use Bernstein’s 

Inequality to show that the |u[/|’s are not too big with high probability. If s > 2, Theorem 4.1 then 
implies that the desired algorithm exists. If s = 1, we are simply bounding a linear function over 
±1 variables. We consider two cases: Small p and large p. 
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Case 1: p < Choosing a = 2s log n in Bernstein’s Inequality, we see that Pr[|n[/| > 2s log n] < 
A union bound over all U then implies that Pr[any \vu\ > 2slogn] < n“®. If s > 2, we 
observe that Pr[u (7 7 ^ 0] < rp, scale the vu's down by 2slogn, and apply Theorem 4.1 to get the 
stated result. If s = 1, we obtain the second bound by observing that 

ViXi < \vi\ < 2 nlogn. ( 10 ) 

je[n] ie[n] 


Case 2: p > We set a = Asy/rplogn and get that Pr[any |n; 7 | > Asy/rplogn] < as 
above. If s > 2, we can then divide the Uf/’s by 4syTplogn and apply Theorem 4.1. If s = 1, we 
get a bound of 4yTp • nlogn in the same way as ( 10 ). □ 


5 Hardness of learning implications 

Recent work by Daniely et al. [DLSS14] reduces the problem of refuting specific instances of CSP(P) 
to the problem of improperly learning certain hypothesis classes in the Probably Approximately 
Correct (PAC) model [Val84]. In this model, the learner is given m labeled training examples 
{xi,i{xi)),... where each Xi E {—1,1}"', each £{xi) E {0,1}, and the examples 

are drawn from some unknown distribution D on { — 1,1}" x {0,1}. For some hypothesis class 
n c {o,i}{-i'i}" we say that P can be realized by Ti if there exists some h G Ti such that 
Pr(3,,£( x))r^v[^ix) 7 ^ ^{x)] = 0. In improper PAC learning, on an input of m training examples 
drawn from P such that P can be realized by some h £ T-L, and an error parameter e, the algo¬ 
rithm outputs some hypothesis function : { — 1 , 1 }" —>■ { 0 , 1 } (not necessarily in Ti) such that 
Pr(j,^£(a;))..^X)[//i(x) 7 ^ ^{x)] < e. In improper agnostic PAC learning, the assumption that P can 
be realized by some /i E ^ is removed and the algorithm must output a hypothesis that performs 
almost as well as the best hypothesis in T-L. More formally, the hypothesis fh must satisfy the follow¬ 
ing: Pr(,j,,£(,j,)).^D[//i(x) 7 ^ i{x)] < min,ig^Pr( 2 ,^^( 2 ,))....x,[/i(x) 7 ^ £{x)] + e. In improper approximate 
agnostic PAC learning, the learner is also given an approximation factor a > 1 and must output a 
hypothesis fh such that Pr(a;,£(a;))^D[//i(x) 7 ^ £{x)] < a ■ min/ig^ Pr(a;/(a;))~x>[/i(a:) 7 ^ (-{x)] + e. 

Daniely et al. reduce the problem of distinguishing between random instances of CSP(P) and 
instances with value at least a as a PAC learning problem by transforming each constraint into a 
labeled example. To show hardness of improperly learning a certain hypothesis class in the PAC 
model, they define a predicate P that is specific to the hypothesis class and assume hardness of 
distinguishing between random instances of CSP(P) and instances with constraints and value 
at least a for all d > 0. They then demonstrate that the sample can be realized (or approximately 
realized) by some function in the hypothesis class if the CSP instance is satisfiable (or has value 
at least a). They also show that if the given CSP instance is random, the set of examples will 
have error at least | (in the agnostic case |) for all h in the hypothesis class with high probability. 
Using this approach, they obtain hardness results for the following problems: improperly learning 
DNF formulas, improperly learning intersections of 4 halfspaces, and improperly approximately 
agnostically learning halfspaces for any approximation factor. 

5.1 Hardness assumptions 

The hardness assumptions made in [DLSS14] are the same as those presented in Section 2.1, except 
for a few minor differences. First, their model fixes the number of constraints rather than the 
probability with which each constraint is included in the instance. It is well-known that results 
in one model easily translate to the other. We include a proof in Appendix D for completeness. 
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Additionally, SRCSP Assumptions 1 and 2 purport hardness of distinguishing random instances of 
CSP(P) from satisfiable instances, even when the algorithm is allowed to err with probability | 
over its internal coins. The algorithms presented in the preceding sections never err on satisfiable 
instances; further, they only fail to certify random instances with probability o(l). As a result, 
our refutation algorithms also falsify weaker versions of both SRCSP Assumptions, wherein the 
allowed probability of error is both lower and one-sided. For each predicate presented in [DLSS14], 
we falsify the appropriate SRCSP assumption using the following approach. For each predicate P 
and corresponding <5 > 0 , we define a degree-t polynomial that (5-separates P. Using the refutation 
techniques presented in the preceding sections, we deduce that constraints are sufficient 

to distinguish random instances of CSP(P) from those that are satisfiable (or have value at least 
a). In order to simplify the presentation, we begin with simpler versions of the polynomials and 
then scale them to attain the appropriate values of 6. The following lemma will be of use for this 
scaling. 

Lemma 5.1. For predicate P : { — 1,1}*^ —>■ {0,1}, let Q : {—1,1}^ —)• IR 6e an unbiased multilinear 
polynomial of degree t such that there exist 6i > 0,6 q < 0 not dependent on z for which the following 
holds: Q(z) > 01 for all z G P“^(l) and Q{z) > Oq for all z G {—1,1}^- Then there exists a degree-t 
polynomial Q : { — 1,1}^ —)• IR that separates P. 

Proof. Define Q{z) = Clearly Q is also unbiased and has degree t. Then for all z G Pi, 

0?-0o — Oi-do • Similarly, for all z, e^-eo = “1 + Oi-Oo ■ 

We now demonstrate that the above can be applied to the predicates suggested in [DLSS14] by 
defining separating polynomials and applying Theorem 4.9 

5.2 Huang’s predicate and hardness of learning DNF formulas 

In order to obtain hardness of improperly learning DNF formulas with a;(l) terms, Daniely et al. 
use the following predicate, introduced by Huang [Hual3]. Huang showed that it is hereditarily 
approximation resistant; Daniely et al. also observed that its 0-variability is [DLSS14]. 

Definition 5.2. Let k = k+ ( 3 ) for some integer k> 3. For z G {—1,1}^, index z as follows. Label 
the first k bits of z as zi,..., z^- The remaining ( 3 ) bits are indexed by unordered triples of integers 
between 1 and k. Each T C [k] with |r| = 3 is associated with a distinct bit of the remaining 
( 3 ) bits, which is indexed by zt- We say that z strongly satisfies the Huang predicate iff for every 
T = {zi,Zj,Z(} such that Zi,Zj,Z£ are distinct elements of [k], ZjZjZ^ = Additionally, we say 

that z satisfies the Huang predicate iff there exists some z' G {—1,1}^ such that z has Hamming 
distance at most k, from z' and z' strongly satisfies the Huang predicate. Define : {~1) 1}^ ^ 
{0,1} as follows: Hi^{z) = 1 if z satisfies the Huang predicate and Pk(z) = 0 otherwise. 

Daniely et al. reduce the problem of distinguishing between random instances of CSP(Pk) 
with 2n'^ constraints and satisfiable instances to the problem of improperly PAC learning the class 
of DNF formulas with ti;(l) terms on a sample of 0{n‘^) training examples with error e = 1/5 
with probability at least |. Here we show that there exists a polynomial time algorithm that 
refutes random instances of CSP(Pk) by demonstrating that does not support a 4-wise uniform 
distribution and applying Theorem 4.9. 

Theorem 5.3. Assume k > 9. There exists a degree-A polynomial Q : {—1,1}^ —)• IR that |- 
separates Consequently, P^ is ^-far from supporting a 4-wise uniform distribution. 
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Proof. As a notational shorthand, write Zabc for Define C : x [“5,5] as 

follows: 


C(*l) ^ 2 , * 3 ) * 4 , * 5 ; * 6 ) z) — 212621342 : 235^456 + 2256214623452123 + 2136223621452245 

+ 2124223423562156 + 2125213523462246- 


( 11 ) 


Observe that for each monomial ztiZt 2 Zt 3 Zt 4 ^ of (, for every j E [ 6 ], X)i=i ^{TiBj} = 2. Further, 
for each T C [ 6 ] with |T| = 3, zt appears exactly once in (. Let Zq be the set of all ordered 6 -tuples 
of distinct elements of [k]. For an ordered tuple I, we use €() to denote membership in I. 

Define Q : { — 1,1}*^ —)• IR as follows. Our final polynomial Q will be a scaled version of Q. 


Q{z) = avg C(l', 2 ). 

1 &Z 3 


Observe that Q does not depend on any of 2 {i},... 2 {^}. By construction, Q contains no constant 
term, so Q(0) = 0. Clearly Q{z) > —5 for all 2 because (11) is always at least —5. 

Now we lower bound the value of Q on all 2 that satisfy We first show that for any z! 
that strongly satisfies the Huang predicate, ( 5 ( 2 ') = 5, then bound ( 5 ( 2 ') — Q{z') for any z with 
Hamming distance at most k from z'. By definition, for each 2 ^,, we have that OjeTi - 1 - 
So for each monomial of Q, 


I or I 2 Ti2t2 2T3 2T4 
| 2 - 6 | 




i=ljeTi 


n 


iZfil 

' ' jeTiUT2UT3ur4 


1 

1 ^’ 


where the last line follows from the fact that X)i=i ^{TiBj} = 2. Because there are 5- monomials 
in Q, their sum is 5. 

Now we consider the case where 2 does not strongly satisfy the Huang Predicate, but Hi^{z) = 1. 
Any singleton index on which 2 ; and z' differ will not change the value of Q. Let N = {T : zt 7 ^ 2 ^}. 
We lower bound Q by counting the number of monomials in which each zt appears and 

1^61 


For fixed T, the number of monomials containing the variables of zt is 

E = 120(« - 3)(« - 4)(« - 5) 

i&z 

because there are exactly 120 ways to permute the three indices of T in I and the remaining k — 3 
indices are permuted in the remaining 3 positions of I. So 

Q{z) > 5 - - 3)(k - 4)(k - 5) = 5 - - -^. (12) 

|Z6| [k-1)[k-2) 

For K > 9, (12) is at least 5 — Applying Lemma 5.1, there exists Q : {—1,1}^ —)• IR that 
^-separates □ 

From this and the fact that FFr = (see [Hual3]), we obtain the following corollary. 
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Corollary 5.4. For sufficiently large n and k > 93, there exists an efficient algorithm that re¬ 
futes random instances of CSP{Hf^) with 0{n^) constraints with high probability. This falsifies 
Assumption 2.1 in the case of the Huang predicate. 

Remark 5.5. If we instead choose to scale Q hy a factor of ^ • 2 ^^ 16^144 rather than substituting 
K = 9 into (12), we can achieve a better separation of <5 = For k > 9, this expression is 

strictly increasing and it approaches ^ as k grows. 


5.3 Hamming weight predicates 


The remaining predicates we would like to examine are symmetric, meaning they are functions only 
of their Hamming weights. Again for each predicate P we present a multivariate polynomial that 
5-separates P for some 0 < 5 < 1. Each of these polynomials can also be written as a univariate 
polynomial on the Hamming weight of its input, which we will use to show that each of the following 
polynomials 5-separates its predicate for the appropriate value of 5. We give the construction below. 

Definition 5.6. For z E { — 1,1}^ where z = zi,...,Zk, define Sz = Yli=iP ^Fe 

Hamming weight of 2 ;. 

Note that this is analogous to the notion of a Hamming weight of a vector in {0,1}^, but differs 
in that it is not simply the count of the number of I’s. We define a general predicate that is satisfied 
when Sz is at least some fixed threshold value 9. 

Definition 5.7. For all odd k and any 9 € {—k, —k + 2,... ,k — 2, k}, define the predicate Thr^ : 
{— 1 , 1 }^ ^ { 0 , 1 } as follows: 

"1 ifSz>9 


Thrliz) = 


■’Z 

0 otherwise 


For example, Maj^. is the same as Thr], and Thr^^ is the trivial predicate satisfied by all 
z E {-1,1}^. 

Because the multilinear separating polynomials we will use are symmetric, we present a trans¬ 
formation to an equivalent univariate polynomial on the Hamming weight of the original input. 


Lemma 5.8. Let Q : {—1,1}^ Wl be of the following form for some a, b,c,d ^ R; 

Q{z) = a z"^ + b z'^ + c z'^ + d z'^. 


(13) 


TC[n] 

|T|=1 


TC[n] 

|T|=2 


TC[n] 

|T|=3 


rc[n] 

|r|=4 


Define Qu : IR —)■ IR as follows: 


d 


b d dk\ 


~ + X'S'z + (0 + 0 r + (“3A: -I- 2)) 777 ( 3 ^ — 6 ). 


24 ^ 6 ^ V2 3 4 y 

Then Q{z) = QuiSz) for all z E {—1,1}^. 
Proof. We can write (13) as follows: 


bk dk 


24 


Q{z) = aJT\ 


k-Sz 


k]+hje2 


k-Sz 


; /c M- c 


k-Sz 


5 j “t" dc .^4 


k-Sz 


(14) 
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where (i) (i-J) denotes the Krawtchouk polynomial of degree i [Kra29, 

KL96]. Substituting u = , yields the following expressions. In [KL96] the first three expressions 

are given explicitly and the fourth can be easily obtained by applying their recursive formula. 


JTi 


;k] =s,, 


k-S, 


k-S, Sl-k 


_ Sl-{‘ik-2)S, 

V 2 ’ y 6 

fk-S ,_, _ 5^ + (8 - 6k)S^^ + - 6k 

V 2 ’ ) ~ 24 


Finally, substituting these expressions into (14) and by some algebra, 


\ d c /b d dk\ / c , ^ bk. elk 

Q(^) - ^ 3 " T j + 6 ■ ^ + 2)) 5, - y + —{3k - 6). 


bk dk 


(15) 

□ 


As a consequence, by choosing values of a, b, c, and d, we can work with a univariate polynomial 
while ensuring that its multivariate analogue is unbiased and has degree at most 4 (degree 3 when 
d = 0). 


5.3.1 Almost-Majority and hardness of learning intersections of halfspaces 

Definition 5.9. Daniely et al. define the following predicate in order to show hardness of improperly 
learning intersections of four halfspaces. 

Isk — I {zki+l ■ ■ ■ Zki-\-k) I ^ ( /\ {^ki+1 ■ • ■ ^ki+k) 

\i=0 J Vi=4 

The reduction relies on the assumption that for all d > 0, it is hard to distinguish random 
instances of CSP(/8A:) with constraints from satisfiable instances. Because the input variables to 
each instance of Thr^^ above are disjoint, it is sufficient to show that each of the first four groups of k 
variables cannot support a 3-wise uniform distribution and consequently neither can therefore, 
from Theorem 4.9 we deduce that there exists an efficient algorithm that refutes random instances 
of CSP(/8fc) with 0(n^/^) constraints with high probability. Daniely et al. define a pairwise uniform 
distribution supported on Igk as well as a pairwise uniform distribution supported on ThrjT , so 
t = 3 is optimal. 

Theorem 5.10. Assume k > 5 and k is odd. There exist 5 = 5{k) > 0 where 5 is Q{k~^) and 
a degree-3 multilinear polynomial Q : {—1,1}^ —>• R, that S-separates Thr^ . Consequently, Thrj) 
does not support a 3-wise uniform distribution. 



Proof. Let 


Q{z) = {k‘^ -k-1) z'^ + {l-k) Y + {'^ + k) Y 


TC[n] 

|T|=1 


TC[n] 

|T|=2 


and define Qu : IR —)• IR as follows: 

1 + ^3 fl — k 

Qu{s) — —^—s + (—^— 

1 + A: Q fl — k 
= + 


s^ +{k^ -k-l + 


1 -\-k 


TC[n] 

|T|=3 


{-3k + 2) s - 


{l-k)k 


s^ + 


3k^-7k-4\ {l-k)k 
s -r-. 
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Then by Lemma 5.8, for all z E {—1,1}^, Q(z) = Qu{Sz)- It therefore suffices to lower bound 
Quis) both when s > —1 and for all s E [—k, k]. 

First we show that Qu is monotonically increasing in s. 


dQu _ fc+l „2 
ds~^ 


+ (1 - k)s + 


3A:2-7fc-4 

6 


{k - 4) (3(s - 1)2 + I + 3/s) + 15 ((s - 1)^ + 


75 


which is evidently positive for k > 5 . 

Because Q is monotonically increasing in s, Quis) > Qui—k) for all s E \—k,k\. 


Qui-k) 


3A:2-7fe-4\^ il-k)k 
6 ) 2 

-i [k^^ + 7k^ -nk^ -k] , 

— ^ \_k{k — 2)(fe2 _)_ 9/j _|_ 5) _|_ g/jj ^ 


-k-l 


k^ + 


1 - k 


k^- 


(16) 

(17) 


which is clearly negative for k > 5 . Now it just remains to lower-bound Quis) for s > —1. Again, 
since Qu is monotonically increasing in s, we use the value Qui—l)- 


Qui-1) 


-k-l , 71-A:\ f 3k‘^-7k-i 
6 ^ \ 2 ) 1 ^ 6 


(l-fc)fc _ T 
2 ~ ^ 


By applying Lemma 5.1, there exists an unbiased multilinear polynomial Q : {—1,1}^ —)• IR of 
degree 3 that -separates Thr"^ □ 

Because VARo(I 8A:) is evidently fl(/c) and Igk < y for all /c > 5, we have the following Corollary. 

Corollary 5.11. For odd k > 5 and sufficiently large n, there exists an efficient algorithm that dis¬ 
tinguishes between random instances of CSPilgk) with 0 (n^/ 2 j constraints and satisfiable instances 
with high probability. 

Remark 5.12. ThrJ^ is the same as 3-OR and Thr^ ^ is the same as is the same as 2-out-of-5- 
SAT, so this approach can be used to nfc(l)-refute 3-SAT instances and 2-out-of-5-SAT instances 
with Ofc(n^/2j constraints, which improves upon the 0 (n^/ 2 +ej constraints required for refutation 
of 2-out-of-5-SAT in [GJ02,GJ03]. 

5.4 Majority and hardness of approximately agnostically learning halfspaces 

Daniely et al. show that approximate agnostic improper learning of halfspaces is hard for all ap¬ 
proximation factors 4> > 1 based on the assumption that for all d > 0 and for sufficiently large 
odd k, it is hard to distinguish between random instances of CSP(Thr],) with constraints and 
instances with value at least 1 — This is based on the fact that maxx) [Tliri( 2 ;)] = 1 — 
where P is a pairwise independent distribution on {—1,1}^, and applying SRGSP Assumption 2. 
Here we show that for odd k > 25, Thr^ is 0.1-far from supporting a 4-wise uniform distribution. 
The value 0.1 is not sharp, but is chosen as a compromise between a reasonably large value and a 
reasonably simple proof. 

Theorem 5.13. There exists a degree-^ multilinear polynomial Q : {—1,1}^ —)• IR that 0.1-separates 
Thr^ for all odd k > 25. 
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Proof. Let 


Q{z) = 


27Vk 


E 

rc[n] 

|T|=1 


T 

Z — 


9P/2 






rc[n] 

|r|=3 


rc[n] 

|T|=4 


and let 


Qu{s) — 


1 




18fc2 54A;3/2 




31 


( 54 ^^ 27^3/2 


1 1 

"+6"3fc 


1 

r 3 4 

5 3 Z' 

18 

24\ 2 

( 31 

10 ^ 

n 18] 


>2^ 

tS/2» + ( 

~~k 


KVk~ 

fc3/2 J 



(18) 


Then by Lemma 5.8, for all z E {—1,1}^) Q{z) = Qu{Sz)- To simplify Q, let a = sk Then we 
can rewrite (18) as follows: 


Quis) = ^ [3a^ - + (-18 + f) + (31 - f) a + 9 - f ] . (19) 

First we lower-bound Quis) for all a € IR using the following expression, which is equivalent to (19) 
by some algebra. 


Quis) 


54 


3(- + i)V-f)^ + fi(<^ + ifi) 


38987378 i M fin-_Ei2 _ 457 \ 

837621 k vw 241 576/ 


> 


54 




v-t-f (- 


4571 

576/ 


48 _ _8 
54 9’ 


where the last inequality follows from the fact that k > 2A and the first two terms are always 
nonnegative. 

Next we lower-bound Quis) for s > 0. 

Q.(^) = ^ [3a^ - 5^3 + (-18 + f) + (31 - ^) ^ + 9 - f ] 

= ^ [3(- - i)^(- - i)^ + - !i)^ + liiio + f + 9c7(a - i)^] > i 

Applying Lemma 5.1, there exists Q : {—1,1}^ —>■ R, such that Q has degree 4 and Q ^-separates 
Thr^. □ 

Corollary 5.14. For sufficiently large n and k, there exists an efficient algorithm that distinguishes 
between random instances of ^^^^(Thr],) with O(n^) constraints and instances with value at least 
0.9 with high probability. 


5.5 Predicates satisfied by strings with Hamming weight at least —0(\/^). 

In light of the fact that the threshold based predicates above are not 4-wise supporting, one may 
attempt to find another threshold-based predicate. Here we show that a symmetric threshold 
predicate that is 4-wise supporting must be satisfied by all strings with Hamming weight at least 
— Furthermore, there exists a symmetric threshold predicate that is 4-wise supporting with a 
threshold of —0(v^) and we sketch its construction. 

--Vk 

We also consider the predicate Thr^ ^ . While it is not used in [DLSS14], we show that it 

does not support a 4-wise uniform distribution in the interest of obtaining a tighter bound for the 
Hamming weight above which an unbiased, symmetric predicate is not 4-wise supporting. The 
threshold of —i^Vk is particularly interesting in that it asymptotically matches the threshold 6 
below which Thr® is 4-wise supporting. 
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Theorem 5.15. Assume k > 99 and k is odd. Then there exists a degree-4 polynomial Q : 


— i\/fc 


— nVk . 


{—1,1}^ —)• R that i^-separates Thr. ^ . Consequently, Thr. ^ is -far from A-wise support¬ 


ing. 


Proof. Define Q : {—1,1}^ —R and : R —R as follows: 
Q(z) = |A:-V2 ^ ^ 


8k 


-2 


Qu(.^^ 


TC[n] 

m=i 


S I ti 

3 ^ 3^72 


TC[n] 

|T|=2 


TC[n] 

|r|=3 


TC[n] 

|r|=4 


+ + 3^) 


* ^ 4 k 


( 20 ) 


Again, for simplicity we set a = sk and obtain the following expression: 

n 1^4,13 7^2,l^,3,l/'8^2,2^ n\ 

Qu{s) — + gCr - jCJ + 2'^+4 + fcl3'^ + 3'^ “ • 

Observe that for A; > 99, | ~ ^ ((^'^ “ i)^ “ fl) ^ lower-bound 

the value of Qu for s > — or equivalently, <7 > — |: 

Q^{s) = \a^ + \a^-la^ + \a + l + \[y + la-2) 

= “ s) w)+('^+1) ((^+mi) + + \ “ 2)+ 

The first two terms are clearly nonnegative when a > — ^, so 


> A _ A = A 

^ 24 48 48' 


We also show that Qu{s) > —-^ for all s E R. 


Q.(.) = ia^ + la3-|a^ + ia + f + HF + i--2) 

= (- + f)'(--i)' + li(- + il)' + S + i(F + i--2)-¥ 


14 

3 ■ 


> {a + (a - V ^ (fj + 

— \y ^ 9 ) 18/ 4- 4gg -r 211; W 365473944 

3 first three terms are always nonnegative, so Qu{s) > — ^• 

--\T 

Applying Lemma 5.1, ^ is ^-far from supporting a 4-wise uniform distribution. 


□ 


Now we demonstrate that there exists a 4-wise uniform distribution supported on Thr^ ^Ck+i 
when A: = 2™ — 1 for some integer m > 3. 

Claim 5.16. Assume k = 2^ — 1 for some integer m > 3. Then there exists a A-wise uniform 
distribution supported only onzEj—1,1}^ such that Sz > I — 2yjk + 1. 

Proof. Let C be a binary BCH code of length k with designed distance 2i + 1 and let be its 
dual. Then the uniform distribution on the codewords of C is 2i-wise uniform [ABI86, MS77]; see 
also [AS04, Ch 16.2]. 

Let c = Cl ... Cfc be a codeword of C^, where each Cj € {—1,1}. The Carlitz-Uchiyama 
bound [MS77, page 280] states that for all c E C^, 


- Ci) < ^ + (t - l)\/k + 1. 


i=l 
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Observe that the quantity ^(1 — Cj) simply maps Cj from {—1,1} to {0,1} so that we can write the 
bound to match the presentation in [MS77]. Therefore, 

k 

Sc = ^ Cj 
i=l 

k 

i=l 

>k-{k + l)-{2L- 2)V/c + l 
= -1 - {2i - 2)VA: + 1. 

Setting i = 2, we can obtain 4-wise uniformity on this distribution and each string in the support 
of the distribution has Hamming weight at least —1 — 2y/k + 1. □ 

Remark 5.17. In order to construct a 4-wise uniform distribution for any value of k, one could 
simply express k as a sum of powers of 2, construct separate distributions on disjoint variables 
as described above for each power of 2 (down to the minimum length for which we can achieve 
distance at least 5, after which point we use the uniform distribution, and obtain a 4-wise uniform 
distribution. The total Hamming weight of a vector supported by this distribution would then be 
at least —0{Vk). 

6 SOS refutation proofs 

6.1 The SOS proof system 

We first define the SOS proof system introduced in [GVOl]. Call a polynomial q G lR[Xi,... ,Xn] 
sum-of-squares (SOS) if there exist qi,... ,q£ G lR[Xi,..., X^] such that q = qi + ■ ■ ■ + qj- 

Definition 6.1. Let X = (Xi,..., X„) be indeterminates. Let gi,..., q^, ri,..., G 1R[X] and 
let A = {qi > 0,..., > 0} U {ri = 0,..., There is a degree-d SOS proof that A implies 

s > 0, written as 

^ hrf s > 0, 

if there exist SOS uq, tti,..., Um G 1R[X] and ui,..., Vm' G 1R[X] such that 

m m' 

s = uo + '^ Uiqi + ^ ViTi 
i=l i=l 

with deg(uo) < d, deg{uiqi) < d for all i G [m], and deg(ujrj) < d for all i G [m']. If it also holds 
that uq, ui,..., Um = 0, we will write H s = 0. 

It is well-known that a degree-d SOS proof can be found using an SDP of size if it 

exists [Sho87, ParOO, LasOO, LasOl] . 

In this section, we will take the set A to be {xf = l}ig[n]! enforcing that variables are ±1- 
valued. We show that with high probability there exists a low-degree SOS proof that a polynomial 
representing the value of a CSP instance is close to its expectation. 

For more information on the SOS proof system and its applications to approximation algorithms, 
see, e.g., [OZ13,Lau09]. 
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6.2 SOS certification of quasirandomness 


All of our SOS results rely on the following theorem, which is the SOS version of Theorem 4.1. 

Theorem 6.2. For k >2 and p > , let {w{T)}rp^^^jk be independent random variables sueh 

that for each T € [n]^, 

E[u;(r)] = 0 (21) 

Pr[r(;(r) / 0 ] < p ( 22 ) 

|t(;(r)| < 1. (23) 

Then, with high probability, 

Te[n]'= 


This theorem was essentially proven by Barak and Moitra [BM15]. We give a proof in Ap¬ 
pendix A.3. We first use this theorem to show that an SOS version of Lemma 4.4 holds. 


Lemma 6.3. Let S C [k\ with [S'! = s > 0. Let r G N and let {wu{i)fu^\^nY,i^\T] independent 
random variables satisfying conditions ( 6 ), (7), and ( 8 ) for some p > ■ Then, with high 

probability. 


T 

{xj < l}ie[n]\- 2 s Y 

ue[nY i=i 


2^^^^ yfrp ■ log®/^ n 
4 maxjyTp, 1} • n log n 


if s >2 
if s = l. 


Proof. We sketch the differences from the proof of Lemma 4.4 given in Section 4.4. For s > 2, the 
lemma follows by using Theorem 6.2 instead of Theorem 4.1. If s = 1, it suffices to show that 


{xf < 1 } 1-2 v(i)xi < |u(z)| . 


for any v since summing over all i as in (10) finishes the proof. If Vi > 0, observe that 

II / .S k(f)l / . n 9 k(f)l /. 9 n 

\Vi\ - v{i)xi = -^{xi - if + ^^(1 - x}). 


If v{i) < 0, we use (xj -|- 1)^ instead of (xj — 1)^. 


□ 


The lemma implies an SOS version of Lemma 4.3. To make this precise, we define a specific 
polynomial representation of Dx,x{S)'. 

= 4 E E i((T,=)a)c®4. 

TE[n]^ cE{dil}^ 

where x^ = OieS^T- Note that this is a polynomial in the Xj’s. 

We can show these polynomials are not too large. 


Lemma 6.4. Let 0 7 ^ 5 C [A:] with [S'] = s. Then 


{xl < l}* 6 [n] ^ 2 . < 

{xj < l}je[n] l“2s Dx,x{Sf°^^ > 


2 ^(^) max{n®/^, ^/n} log^^^ n 

77j;l/2 

2‘4{^) max{n^/^, y/n} log^^^ n 

fxTl2 


with high probability, assuming also that m > max{n®/^,n}. 
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Proof. The proof is essentially the same as that of Lemma 4.3. The expression we bound in that 
proof is exactly Dx,x{S)^°^^. We use Lemma 6.3 instead of Lemma 4.4 to show that this can be 
done in degree-2s SOS. □ 


Based on Lemma 3.13, we will think of Lemma 6.4 as giving an SOS proof of quasirandomness. 
Below, we use it to prove SOS versions of Theorems 4.6 and 4.9. 

6.3 Strong refutation of any fc-CSP 

We now define the natural polynomial representation of Valx(x): 

Valf'y(x) = ^ l{(Tc)eX} ( X] ^(*5)0^4 ) , 

Te[n]'=ce{±i}'= \sc[fc] / 

where Xj, is as above. 

We can then give an SOS proof strongly refuting CSP(P). 

Theorem 6.5. Given an instance X Fp{n,p) o/CSP(P), 

{xf < l}i6[n] '^2k Val5°'^(x) < P + 7 

with high probability when m > ^ ^ ^ • 

Proof. By rearranging terms, we see that 

ValP°'y(x) = P+ ^ P(5)P^(5)P°'T 

0^5C[A:] 

Note that this is just Plancherel’s Theorem in SOS. The theorem then follows from Lemma 6.4 and 
the observation that X] 5 c[A:] l-f’('S’)| < □ 

6.4 O(l)-refutation of non-t-wise supporting CSPs 

Theorem 6.6. Let P be 6-far from being t-wise supporting. Given an instance X Fp{n,p) of 
CSP(P), 

{Xi = Pmax{fc,2t} Val5°^^(a:) < 1 - (5 + 7. 

with high probability when m > ^ ^ ^ and t >2. 

To prove this theorem, we will need to following claim, which says that any true inequality in 
k variables over {—1,1}^ can be proved in degree-A: SOS. Recall that the multilinearization of a 
monomial G ..., 2 ;a:] is defined to be zf^ ^ • z^jf" mod 2 ^ replace 

all zf factors by 1. We extend this dehnition to all polynomials in ]R[zi ,... ,Zk] by linearity. 

Claim 6.7. Let f : { — 1,1}*^ —)• IR such that f{z) > 0 for all z G {—1,1}*^ and let /P°h ftg the 
unique multilinear polynomial sueh that f{z) = f^°^^{z) for all z G {—1,1}^. Then 

{zf = l}*6[fc] > 0. 
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Proof. Since f{x) > 0, there exists a Boolean function g : {—1,1}^ —)• R such that = f{z) 

for all z G { — 1,1}^. Let be the unique multilinear polynomial such that g{z) = g^°^^{z) for 
all 2 ; G { — 1,1}^. Since gP°^y{z)‘^ = fP°^y{z) for all z G { — 1,1}*^, uniqueness of the multilinear 
polynomial representation of / implies that the multilinearization of ^ is equal to /P°h_ 

Written another way, we have that {zf = l}i 6 [fc] {z)‘^. This implies that 

> 0 . □ 

Proof of Theorem 6.6. The proof is an SOS version of Proof 2 of Theorem 4.9 above. Claim 6.7 
implies that for Q of degree at most t that 5-separates P, 

{zf = bfc Q{z) - P{z) + 1 - 5 > 0. 

Summing over all constraints, we get 

{xi = IjieM ValP°'y(x) “ (1 “ <^) < ^ X] l{(Tc)eX} [ Y Q{S)Fx^ 

^ T&[n]k C&{±1}>‘ \5C[fc] 

Rearranging terms as in the proof of Theorem 6.5, we see that the right hand side is equal to 

sc[fc] 

Since Q has mean 0, IQI < 2 ^ and X]sc[fc] l-P(‘S')l < The theorem then follows from 

Lemma 6.4. □ 

With Corollary 3.17, the theorem implies that we can Ofc(l)-refute any CSP(P) in SOS when 
P is not t-wise supporting. 

Corollary 6.8. Let P be a predicate that does not support any t-wise uniform distribution. Given 
an instance Z Tpin,p) o/CSP(P), 

{Xi = l}iew '“max{fc, 2 t} Valx(x) < 1 - + 7 

with high probability when m > igg^ n and t > 2. 

7 Directions for future work 

It would be interesting to show analogous efficient refutation results for models of random CSP(P) 
in which literals are not used. This would allow for results on, say, refuting g'-colorability for random 
fe-uniform hypergraphs. We give a simple result on refuting ( 7 -colorability of random hypergraphs 
in Appendix C, but it follows from refutation of binary CSPs and perhaps a stronger result could 
be proven by studying CSPs with larger alphabets. For some predicates (e.g., monotone Boolean 
predicates), random CSP instances are trivially satisfiable when there are no literals. However for 
such predicates one could consider a “Goldreich [GolOO] -style” model in which each constraint is 
randomly either P or -iP applied to k random variables. 

Additionally, it would be good to investigate whether onr refutation algorithms can be extended 
from the purely random GSP(P) setting to the “smoothed” / “semi-random” setting of Feige [Fei07], 
in which the m constraints scopes are worst-case and only the negation pattern for literals is random. 
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Feige showed how to efficiently refute random 3-SAT instances with m > constraints even 

in this model. 

Another valuable open direction would be to shore up the known proof-complexity evidence 
suggesting that 0(n*/^) constraints might be necessary to refute random CSP(P) when P is not 
t-wise supporting. The natural question here is whether the SOS lower bound of [BCK15] can be 
extended from non-pairwise uniform supporting and m = 0{n) constraints, to non-t-wise uniform 
supporting and m = constraints. (Of course, it would also be good to eliminate the 

pruning step from their random instances.) One might also investigate the more refined question of 
whether, for P that are J-far from t-wise supporting, one can improve on 5-refutation when there 
are m > constraints. 

Followup work on the very interesting paper [FKO06] of Feige, Kim, and Ofek also seems 
warranted. Recall that it gives a nondeterministic refutation algorithm for random 3-SAT when 
m > (as well as a subexponential-time deterministic algorithm). This raises the question of 

whether there exist polynomial-size refutations for random CSP(P) instances that are nevertheless 
hard to find efficiently. 

Finally, we suggest trying to rehabilitate the hardness-of-learning results in [DLSS14], given 
our new knowledge about what random CSP(P) instances seem hard to refute. As mentioned, 
the followup work of Daniely and Shalev-Shwartz [DSS14] shows hardness of PAC-learning DNFs 
with a;(logn) terms based on the very reasonable assumption that refuting random fe-SAT requires 
constraints for some f{k) = a;(l). Subsequent work by Daniely [DanlS] shows hardness of 
approximately PAC-learning halfspaces assuming that refuting random fc-XOR is hard both when 
m = •jpPkiogk when k is polylogarithmic in n and m = for some c > 0. One future direction 
would be to obtain hardness results for agnostically learning intersections of halfspaces. 
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A Proof of Theorem 4.1 


We restate Theorem 4.1: 

Theorem 4.1. For k >2 and p > , let {w{T)}rp^^^^k be independent random variables such 

that for each T G [n]^ , 

E[t(;(r)] = 0 (24) 

Pr[r(;(r) ^ 0] < p (25) 

\w{T)\ < 1. (26) 

Then there is an efficient algorithm certifying that 

ri;(r)x'^ < (27) 

T&[n]l^ 

for all X G R” with < 1 with high probability. 

The proof of this theorem constitutes the remainder of this section. It will often be convenient 
to consider T G [n]^ to be (Ti,T 2 ) G \n]^^ x [n]^^ with ki + k 2 = k. In such situations, we will write 
w{T) = w{Ti,T 2 ). For intuition, the reader can think of the special case of w{T) G {—1,0,1} for 
all T and y G {—1,1}”'. Under these additional constraints, Opt(X) — | for a 

random /c-XOR instance X so we are certifying that a random /c-XOR instance does not have value 
much bigger than i. 

A.l The even arity case 

When k is even, we can think of w{T)x'^ as a quadratic form: 

Y w{T)x'^ = Y 'w(Ti,T2)yTiyT2, (28) 

Te[n]>^ ri,T2e[n]'=/2 

where yu = x^. We give two methods to certify that the value of this quadratic form is at most 
Ok{^/pn^^/^ log n). The first method uses an SDP-based approximation algorithm and works only 
for X G { — 1,1}”. The second method uses ideas from random matrix theory and works for any x 
with ||x||j,^ < 1. 

Approximation algorithms approach If x G {—1,1}”, we can apply an approximation al¬ 
gorithm of Charikar and Wirth [CW04] for quadratic programming. They prove the following 
theorem: 

Theorem A.l. [CW04, Theorem 1] Let M be any n x n matrix with all diagonal elements 0. 
There exists an efficient randomized algorithm that finds y G {-1,1}” such that 

E\y~^My] > LI ( — ^ — | max x~^Mx. 

\logn ) 

By Markov’s Inequality, this statement holds with probability at least 1/2. We can run the 
algorithm O(logn) times to get a high probability result. To apply Theorem A.l, we separate out 
the diagonal terms of (28), rewriting it as 

Y w{Ti,T2)yTiyT2 + Y ( 29 ) 

Ti^T2e[n]''/2 {/e[n]'=/2 
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We can certify that each of the two terms in this expression is at most 0{y/pn^^^^logn). For the 
hrst term, we will need the following claim. 

Claim A.2. With high probability, it holds that 

Y, w{TuT2)yT,VT, < 

Ti,T2e[n]''/2 

This follows from applying Bernstein’s Inequality (Theorem 4.12) for fixed y and then taking 

k l‘l 

a union bound over all y G {—1,1}" • Using the claim, we see that Theorem A.l allows us to 

certify that the value of the hrst term in (29) is at most 0{^/prfi^/^\ogn). 

We will use the next claim to bound the second term of (29). 

Claim A.3. With high probability, it holds that 

Y \'^u,u\ < 0(v/pn^^/^). 

Since the \ wt\ < 1 and Pr[r(;' 7 ’ 7 ^ 0] < p, the claim follows from the Chernoff Bound. The second 
term of (29) is upper bounded by \wu^ij\ and we can compute this quantity in polynomial 

time to certify that its value is at most 0 (^/pn^^/^). 


Random matrix approach Observe that (28) is y^By for a matrix B indexed by 17 G 
so that Bjj^^ij^ = w{Ui,U 2 )- Then y~^By < ||R|| ||y||^. To certify that y~^By is small, we compute 
||R||. We need to show that ||R|| is small with high probability. First, note that ||R|| is equal to 
the norm of the 2 n^/^ x 2 n^/^ symmetric matrix 


B = 




For example, this appears as (2.80) in [Taol2]. The upper triangular entries of B are independent 
random variables with mean 0 and variance at most p by the properties of the rcg’s. We can then 
apply a standard bound on the norm of random symmetric matrices [Taol2]. 

Proposition A. 4. [Taol2, Proposition 2.3.13] Let M be a random symmetric matrix nx n whose 
upper triangular entries Mij with i > j are independent random variables with mean 0, variance at 
most 1, and magnitude at most K. Then, with high probability, 

||M|| = 0{\/n\ogn ■ max{l, K/^/n}). 

Let B' = '^B. The upper triangular entries of B' are independent random variables with mean 
0, variance at most 1, and magnitude at most Xj^fp. Applying Proposition A.4 to B' shows that 
||R|| =0 ^/cn^/^^/plogn • max |l, ^^fc /4 with high probability. Since ||?/||oo < 1 by assumption, 
||y|p < and (28) is at most Oik^rc'^l'^Xogri) with high probability when p > 
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A.2 The odd arity case 


Fix an assignment x G [—1,1]". For r G [n], the monomials containing Xi can contribute at most 


W, := 


w{T.ii)x'^\ to the objective if Xi is set optimally. By Cauchy-Schwarz, 


w{T)x'^ <Y,W^< jY.^1 (30) 

Te[n]'' ie[n] y ie[n] 

SO it suffices to bound Wf. We will write this as a quadratic polynomial and then bound it 

using spectral methods: 




ie[n] r,(7e[n]*^“l is[n] 

r{.r',(7(,r/'eN-M 

Define the x matrix A indexed by 


(31) 


A 




E£e[n]^(h,JlJ)w(i2j2j) 

0 


if (h,ji) 7^ (*2,j2) 

otherwise, 


(32) 


where we have divided the indices of A into 2 blocks of coordinates each. Define x^^ ^ G 1R["1*’ 
so that x®^~^{T) = x'^. Then (31) is equal to 




E (^^) 


T,U&[n\ 


iS[n] 


w{T,UAf 


(33) 


The first term is at most ||T|| ^ since the variables are bounded. We can compute ||T|| to certify 

this. With high probability, ||A|| is not too big. 

Lemma A.5. Let k > 3 andp > . Let {w{T)}rp^^^-^k he indepedent random variables satisfying 

conditions (6), (7), and (8) above. Let A he defined as in (32). With high probability, 

IIAII < 2^^^^pn^/^log^n. 

We can therefore certify that the first term is log^ n. We will prove the lemma in 

Appendix A.4. 

The second term of (33) is at most . We can easily compute this and the Chernoff 

Bound implies that its value is at most p7),3fc/2-i high probability. 

So far, with high probability we can certify that X]ie[n] ~ log^ n. Plugging 

this bound into (30) concludes the proof. 

Remark A. 6. It would have been more natural to have written Wf = A'x®^~^ 

for A' such that A'j,^ = 'fi2,ie.[n\'^i"^A)w{U,i). However, ||A'|| could be too large because of the 
contribution of the second term in (33). We use the additional assumption that ||x||^ < 1 to get 
around this issue. 


Remark A.7. We have defined A so that ^(ii,i 2 ),(ji,i 2 ) ~ Ste[n]not Ajj = 
31'he reduces the correlation among entries w{b,c) and w{b,c') for c / c'. 
Intuitively, A looks more like a random matrix with independent entries, so we can bound its norm 
using the trace method. See the proof of Lemma A.5 in Appendix A. 4. 
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A.3 An SOS version 

In this section, we will prove the SOS version of Theorem 4.1. 

Theorem 6.2. For k >2 and p > let {w{T)}rp^^^jk he independent random variables sueh 

that for each T G [n]^ , 

E[w{T)] = 0 
Pr[t(;(T) / 0] < p 
\w{T)\ < 1. 

Then, with high probability, 

{x^i < l}ieN w{T)x'^ < 20('=)^n3"/"log3/2n. 

TeM^ 

Rather than writing out the full proof, we will indicate the small changes required to convert 
the above proof of Theorem 4.1 into SOS form. 

Even arity The random matrix proof for the even case can easily be converted into an SOS proof 
with degree k. When O{k^/pn^/* log n)I — B ^ 0, there exists a matrix M such that M~^M = 
0{k-,Jpn^^^\ogn)I — B. Then 

0{k^n^/'^\ogn)\\y\\'^-y'^By = {My)^{My)= ^ ^ MT,uyu 

Te[n]''/2 \;7e[n]'=/2 

so 

{xj <l}i(i[n\'rk X] w{T)x^ <0{ky/pn^^/^\ogn). 

TG [n] ^ 

Odd arity A couple of additional issues arise in the odd case. First of all, the square root in (30) 
is not easily expressed in SOS, so we instead prove the squared version 


\T&[n]>‘ / 

By a simple extension of [OZ13, Fact 3.3], (34) implies (27) in SOS : 

Fact A.8. 


(34) 


Proof. 


^ '1,2 1 


<b^\-2X < b. 


n2 b I ^2 b 


1 


-(b^ - AO + 4(6 - A)^ = - - 4A^ + - - a + 4a2 = 6 - a. 
26^ ^ 26^ ^ 2 26 2 26 


□ 


Secondly, we do not know how to prove the Cauchy-Schwarz inequality (30) in SOS. However, 
O’Donnell and Zhou show that a very similar inequality can be proved in SOS [OZ13, Fact 3.8]: 


Fact A.9. 


h2 Yz < -y2 +1^2 
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Using this fact instead of Cauchy-Schwarz to prove the squared version of (30), we can follow 
the argument above to show that 

{x- < l}i6[n] h 2 fc ( + n Y ■ 

\Te[n]'= / Te[n]'“ 

The norm bound can be proven in SOS exactly as in the even case. 


A.4 Proof of Lemma A.5 

We restate the definition of the matrix A and the statement of the lemma. 

Lemma A.5. Let k > 3 andp > . Let {w{T)}rp^[j^^k he indepedent random variables satisfying 

conditions (6), (7), and (8) above. Let A be the [n]^~^ x [n]^~^ indexed by [n]^~^ that is defined as 
follows: 

< _ /Ete[n] */(U,jl) 7^ (i2,j2) 

(n,* 2 ),(ii,i 2 ) otherwise. 

Then with high probability, 

|| 4 || < 20^’^^pn^/^log^n. 

The proof closely follows the arguments of [COGL04, Lemma 17] and [BM15, Section 5]. Both 
proofs use the trace method; To bound the norm of a symmetric random matrix M, it suffices to 
bound E[tr(M^)] for large r. For non-symmetric matrices, we can instead work with MM~^. In 
our particular case, we have the following. 

Claim A.10. //E[tr((^A''~)'’)] < , then ||^|| < n with high 

probability. 

Proof. Observe that ||A|p^ < tr((AA"'')^). By Markov’s Inequality, Pr[||A|| > B] < ^ . 

We get the claim by plugging in r = 0(logn) and setting constants appropriately. □ 

Remark A.11. We can get arbitrarily small l/poly(n) probability of failure: This proof shows 
that ||j4|| < K2^^^^pn^/‘^\o^ n with probability at most 

In the the remainder of this section, we will bound E[tr((4.4. 

Lemma A.12. Under the conditions of Lemma A.5, E[tr((44''~)^)] < with 

high probability. 

Proof. Recall that we index 4 by elements of divided into two blocks of coordinates each. 
First, note that 

tr((44 ) )— (44 )('jj^^j 2 ),(i 3 ,i 4 ) (^^ )(*3,*4),(*5,*6) ' ' ' (^^ )(*2r-l42r),(*l,*2)' 

k-1 

n,...,*2r£H ^ 

Expanding this out using the definition of 4 and setting wt = w{T), we get that 


tr((44 ) ) ^ ^ ^n,jl/l^»2,j2/l^»3,jl/2^M,j2,<?2 ' ' ' ^*2r-l,i27— 1 /2r-l ^*2r ,i2r/2r-l ^*1 ,j2r-l,<?2r ^*2 ,i2r/2r ) 
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where the sum is over H and ii, - ■ ■ - ■ ■ ,j 2 t £ [n]^ ^ satisfying 


(*s+2)Js) 7^ (^s+3)Js+l) 
(*l)j2r—l) 7^ (^2)j2r)- 


for 1 < s < 2r - 1 (35) 

for 1 < s < 2r - 3 (36) 

(37) 


fc —1 

Let 0 be the set of all (ii,..., i 2 r,ji, ■ ■ ■ ,j 2 r) £ ([n]~^)^'’ satisfying (35), (36), and (37). Then for 
J E n and L = {£i,... ,i 2 r) £ N^’', define 


1, J1 ,^1 ,7l 1^2 ,J2 5^2 


' Wi. 


Lr-l,J2r-l,^2r-l ^i2r ,j2r ,i2r-1 ,j2r-1 ,i2r ^i2,j2r,i2 


(38) 


fc —1 

Let |J| = |{ii, ■ ■ ■, i 2 r, ji) ■ ■ ■) J 2 r}| be the number of distinct elements of [n]^“ in J and define 
\L\ = |{7i,... ,i 2 r}\ similarly. We then have 


4r 2r 

E[tr{(AAT)'')] = E|pm = EEE E 

|7|=a |L|=b 


To bound this sum, we will start by bounding E[Pj^/,]. We will need two claims. 

Claim A.13. The number of distinct Wij^i factors in Pj^l is at least 2\L\. 

Proof. For each i G L, (38) shows that Pj^l contains a pair of the form or 

Wi^+ 2 js,e'Wis+ 3 js+iP Since J E 0, we know that {is,js) 7^ (is+ijs+i) or {is+2,js) 7^ {is+3,js+i), so 
each of these pairs must have two distinct Wij^i factors. We then have at least 2|L| distinct Wij^i 
factors. □ 

Claim A.14. The number of distinct Wij^i factors in Pj^l is at least | J| — 2. 

Proof. Consider looking over the factors of Pj^l from left to right in the order of (38) until we 
have seen all elements of J. The first pair 'n)ii,ji,£in'i 2 j' 2 ,^i contains at most four previously-unseen 
elements of J. Every subsequent pair of factors or in Pj^l 

shares two variables of J with its preceding pair. Each such pair can then contain at most two new 
elements of J. After seeing u Wij/s, we have therefore seen at most 4 + 2 (^^) distinct elements 
of J. To get all | J| elements of J, we must have seen at least | J| — 2 Wij/s and these must be 
distinct. □ 

Since 7^ 0 ] < P, E[Pj,l] < p#{<iistinct factors in follows that 

E[Pjl] < 

The two claims also imply two other facts we will need below. 

Claim A. 15. If \L\ > r, then PiIPj^l] = 0. 

Proof. We will show that if \L\ > r, there is an Wij^i factor in Pj^l that occurs exactly once. Since 
E = 0, this proves the claim. 

Assume for a contradiction that |L| > r and every factor occurs at least twice. Since there 
are at least 2|L| distinct Wi^ys, there must be at least 4|L| > 4r total Wi^j/s. However, looking at 
(38), Pj^L has at most 4r Wij^i factors. □ 


40 


Claim A.16. If \J\ > 2r + 2, then E[Pj^l] = 0. 

This can be proved in exactly the same manner. 

a.(fe —1) ^ a(/c —1) ^ ^ ^ 

Next, observe that the number of choices of J with | J| = a is at most n 2 < n 2 (4r)^'’. 

The number of choices of L with \L\ = 6 is at most < n^(2r)^'’. All together, we can write 

2r+2 r 

E[tr((AA^)’')] < ^ 


a=l 6=1 


We bound each term of the sum. 

Claim A.17. 


^^ +fepmax{26,a-2} ^ ^kr+k-lp2r 


Proof. If 26 > o — 2, 

^ a(k-l) ^b^max{26,a-2} ^ (2t+2)(fc-l) _|_b^max{26,a-2} _ 

If 26 < a - 2, 

^^^^+6pmax{26,a-2} ^ ^fc-l('^fcp2^a/2-l ^ 

Recall that we assumed > 1. Since a < 2r + 2 and 6 < r, the claim follows. □ 

To conclude, observe that 

2rH-2 r 

E[tr[(AA^)n] < ^ ^ 

a=l 6=1 

□ 

Remark A.18. If we did not have conditions (35), (36), and (37), we would only have been able 
to show that \L\ < 2r. This would have led to a weaker bound of 0{y/n). 

B Extension to larger alphabets 

B.l Preliminaries 

CSPs over larger domains We begin by discussing CSPs over domains of size q > 2. We prefer 
to identify such domains with so our predicates are P : —>■ {0,1}. The extensions of the 

definitions and facts from Section 3.1 are straightforward; the only slightly nonobvious notion is 
that of a literal. We take the fairly standard [Aus08] definition that a literal for variable Xi is any 
Xi + c for c G 'Eq. Thus there are now possible “negation patterns” c for a P-constraint. We 
denote by Pq^p{n,p) the distribution over instances of CSP(P) in which each of the q^n^ constraints 
is included with probability p; the expected number of constraints is therefore rn = q^n^p. We 
have the following slight variant of Fact 3.6. 

Fact B.l. Let I ~ Pq^p{n,p). Then the following statements hold with high probability. 

1 . m=|I|em.(l±o(yi*)). 

2. 0pt(I)<P.(l + 0(y'i2.i)). 

3. X is O { Jq^ log q ■ = ) -quasirandom. 
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Fourier analysis over larger domains Let lAq is the uniform distribution over Tjq. We con¬ 
sider the space Lp‘{7Lq,hlq) of functions / : Zg —)• R equipped with the inner product (/, < 7 ) = 
^z'^Uq [/(^)fl'(^)] and its induced norm ||/||2 = ^zr^Uq Fix an orthonormal basis xo; • • • > Xg-i 

such that xo = 1 - 

Now let Lp‘{7L^,Uq) be the space of functions / : —)• R, where Uq is the uniform distribution 
over Zg and we have the analogous inner product and norm. Then, for cr € define Xo- ^ ^ R 

such that 

X<t{x) = X<Ti{xi). 

i&[k] 

The set {Xo-jo-gafe forms an orthonormal basis for L‘^{'Eq,Uq) [Aus08, Fact 2.3.1] and we can write 
any function / : —>■ R in terms of this basis: 

fix) = fi^)Xaix). 

a-GTjq 

Orthonormality once again gives us Plancherel’s Theorem in this setting: 

Theorem B.2. 

if, 9 ) = /(o-Mo-). 

o-esj 

For a G define supp((T) = {i G [k] | di / 0} and |(t| = |supp((T)|. Then we define the degree 
of / to be max{|(T| | /(d) 0}. Note that this is the degree of / when it is written as a polynomial 

in the Xa’s for a G Zg. 

Given a /c-tuple T and a G we use T(d) to denote the |d|-tuple formed by taking the 
projection of T onto the coordinates in supp(d). Similarly, use T{a) to denote the (fc — |d|)-tuple 
formed by taking the projection of T onto coordinates in [k] \ supp(d). 

See [0’D14, Aus08] for more background on Fourier analysis over larger domains. 

B.2 Conversion to Boolean functions 

To more easily apply our above results, we would like to rewrite a function / : —)■ R as a Boolean 

function : {0,1}^ —5- R for some k'. It will actually be more convenient to define on a subset 
of {0,1}^'. In particular, consider the set = {u G {0, | Yla&TLq'^if'^) = 1 Vi G [A;]}. 

Note there is a bijection cf) between and 12*,: For z G ((/>(x))(i,a) = the other 

direction, given v G set = Yla&w.q ® ' x{i,a). 

For a function / : —)• R, we can then define its Boolean version /^ : 12^ —R as 

= Y fi^) n 

aesj iG[k] 

Observe that f{z) = f^{4>{z)) for z G Also, note that if f{z) = g{z) for all z G over 

all R^ by construction. is a multilinear polynomial and its degree is defined in the standard 
way. The degree of / is defined as in the previous section. 

Claim B.3. The degree of f’^ is equal to the degree of f. 
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Proof. Abbreviate supp(cr) as s{a) and denote supp(cr)’s complement with respect to [k] as s(cr). 
Applying the definition and writing /’s Fourier expansion, we see that f^{v) is equal to 


X] n Xa{a)Wv{s{a)i,a[) ^ H 


a,- 




ie[fe] 


o'eaj a'ez. 


kl 


i=l 


a"ez. 


k-\a\ i = l 


Now observe that 


fc-|(T| 


k-W\ 


Y1 W '^{s{a)i,a”) = ^ n(s(o-)i,a) = 1 


a''ez 


fe-kl i=l 


i=l aSSn 


by the assumption that n G flfc. The degree of is therefore 


a . 


□ 


B.3 Quasirandomness and strong refutation 

To prove quasiandomness and strong refutation results for CSPs over larger alphabets, we proceed 
exactly as in the binary case. We used the t = k case of Lemma 3.13 (the Vazirani XOR Lemma 
[Vaz86, Golll]) to certify quasirandomness for binary CSPs. A generalization of this case holds for 
Abelian groups [Rao07, Lemma 4.2]. 


Lemma B.4. Let G be an Abelian group and letlAc be the uniform distribution over G. Also, let 
{Xo-fa-eG be an orthonormal basis for L‘^{G,Ug) and let D : G ^ IR. be a distribution over G. If 
D{o') < e for all a G G, then dTy{D,UG) < \\G\^/‘^e. 

Viewing the induced distribution density Dx^x{^) as a function of a: € Z” for fixed cr G we 
will consider Dlf ^{a) : It- As before, we can certify that D^y has small Fourier coefficients. 


Lemma B.5. Let ct G such that u 7 ^ 0 and |cj| = s. There is an algorithm that with high 
probability certifies that 




< 


qOik) niaxln®/"^, ^/n} log®'^^ n 


'm 


for all y G {0, l}Nx®? when m > max{n^/^,n}. 


Proof. The proof is essentially identical to the proof of Lemma 4.3. We highlight the differences. 
First of all, we can write 

^X,x(cr) yii,Xi) = ^ Y Y ^{{T,c)&X} Xa{xT + c) y{i,Xi). 

a:eS" ie[n] Te[n]''ceSj ie[n] 


Since Xa only depends on coordinates in supp((T), we can rearrange and use the fact that Ylae'z Hi,a, = 
1 to get 

|cr| 

c/e[n]l<^l *=1 Telnf 

T(a)=U 

where Wa,a{T) = Yjc&iLk ^{{T,c)€X}Xa{a + c(a)). Observe that E[wa,aiT)] = 0 and Pr[u;<:r,«(r) / 
0] < q^p. Since Uxo-H = 1, observe that the Cauchy-Schwarz Inequality implies that |Xo-| < for 
all a. Then \wa^oi{T)\ < q^^G for all a and a. For every a, we can then apply Lemma 4.4 just as 
in the proof of Lemma 4.3. □ 
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These two lemmas then imply the larger alphabet versions of the quasirandomness certification 
and strong refutation results above. 


Theorem B.6. There is an effieient algorithm that eertifies that an instanee X ^q,p{n,p) of 

0(k) fe/2 1 5 

CSP(P) is 'j-quasirandom with high probability when fn > - - 


Theorem B.7. There is an effieient algorithm that, given an instanee X Tg,p{n,p) of CSP(P), 

_ 0(k) kjl^ 5 

certifies that Opt(X) < P + 7 with high probability when m > - -. 


B.4 Refutation of non-t-wise supporting CSPs 

We will show that the dual polynomial characterization of being far from t-wise supporting described 
in Section 3.4 generalizes to larger alphabets. We extend the definitions of t-wise supporting and 
(5-separating polynomials to the Zg case in the natural way. 

Lemma B.8. For P : ^ {0,1} and 0 < 5 < 1, there exists a polynomial Q : Zg ^ IR. of degree 

at most t that 5-separates P if and only if P is 5-far from supporting a t-wise uniform distribution. 

Proof. The proof uses the following dual linear programs exactly as in the proof of Lemma 3.16. 


minimize 

Y,{^-P{z))P{z) 

(39) 

S.t. '^P{z)Xa{z) 

= q^P{a) =0 VcT G 0 < |ct| < t 

(40) 


E 1 



P{z) >0 Vz G 



maximize f, 


s.t. E c{S)xa{z) <1-Piz) - C 

Vz G Z^g. 

<7ea^ 


0<|fT|<4 



To prove Lemma 3.16, we needed to show in the binary case that feasible solutions to the primal 
LP (1) were t-wise uniform. We now argue that the constraint (40) is a sufficient condition for 
t-wise uniformity of P in the q-aiy case. For a distribution V over Zg and 5 C [fc], define Ps to be 
the marginal distribution of P on {Zg)s, i.e., Ps{z) = need to show that 

(40) implies that Ps = for all S C [k] with 1 < IS"! < t. 

Fix such an S and let |5| = s. Consider the basis {Xa}ae'z=- Lemma B.4 implies that it 
suffices to show that P,zr^ui[Ds{z)Xa{z)] = 0 for all a G Zg. Observe that 'Ezr.-u^[Ds{z)Xa{z^)] = 
^u<^\P{z')xa{z')\ for fj G Zg such that ai = a, for i G 5* and Uj = 0 otherwise. Since |5| < t, we 
know that |ct| < t and (40) implies E^/ r^uf[T^{^')Xa{z')] = 0 . 

The rest of the proof is exactly as in the binary case. □ 


44 






We can again use these separating polynomials to obtain almost (5-refutation for predicates that 
are (5-far from f-wise supporting. 

Theorem B.9. Let P he 5-far from being t-wise supporting. There exists an efficient algorithm 
that, given an instance X {n,p) of CSP(P), certifies that Opt(X) < 1 — (5 -|- 7 with high 

probability when m > ^ and t > 2. 

The proof is essentially identical to Proof 2 of Theorem 4.9. 

Corollary 4.11 also extends to larger alphabets. 

Corollary B.IO. Let P be a predicate that does not support any t-wise uniform distribution. Then 
there is an efficient algorithm that, given an instance I p{n,p) of CSP(P), certifies that 

Opt(X) < 1 — with high probability when m > log® n and t > 2. 

This follows directly from Theorem B.9 and the following extension of Corollary 3.17 to larger 
alphabets. 

Corollary B.ll. Suppose P : —>■ {0,1} is not t-wise supporting. Then it is in fact 5-far from 

t-wise supporting for 5 = . 

The proof is essentially identical to the proof of Corollary 3.17: Observe that the LP (39) has 
at most q^lfi' variables and proceed exactly as before. 

B.5 SOS proofs 

Here we give SOS versions of our refutation results for larger alphabets. 


Certifying Fourier coefficients are small To give an SOS proof that Fourier coefficients of 
y are small, we again need to define a specific polynomial representation of Dlfy{a). 

— 1 

X] Y xM + c{(T))Wy{T{a)i,ai). 

T&[n\*‘c&TLk i=l 


Lemma B.12. Let 0 a € Zy with |(t| = s. Then 

..^2 / -n L . fT-^_^polv ^ V^}log®/2n 

{y[l,a) < l}iE[n] ^max{2s,A:} ^X,y\T) ^ ,r^l/2 




m 


r.,/. „^2 / n L . g‘^(*^)max{n^/4,^}log®/2„ 

{?/(z, a) < l-}iE[n] ^max{2s,A:} ^X,y\T) P _i /o 

aez„ 


with high probability, assuming also that m > max{n®/^,n}. 

Proof. In the proof of Lemma B.5, we certify that \Dx^y{cr)\ is small by certifying that 

is small. The proof of Lemma B.5 relies only on Lemma 4.4; we can replace this with its SOS 

version Lemma 6.3. □ 


Remark B.13. We stated the lemma with the weaker set of axioms {y{i,a)‘^ < l}jg[n],aeSg' 
Since y{i, a)^ = y{i, a) implies y{i, a)^ < 1 in degree -2 SOS, the lemma holds with the axioms 
{y{i,af = y{i,a)}i(z[n],a&w.,j as well. 
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Strong refutation of any fc-CSP From our SOS proof that the Fourier coefficients are 

small, we can get SOS proofs of strong refutation for any A;-CSP. To do this, we need to define a 
specihc polynomial representation of Val^(y) for an instance I of CSP(P): 

i E E h(T..}a} E + If 

re[n]'=cea^ i6[fc] 

Theorem B.14. Given an instance Z T,,p(n,p) of CSP(P), 


{y{i, af = y{i, a)}i(z[n] 0 < ^ y{i, a) = 1 > ^ 2 k Vali(7/)P°^y < P + 7 




9 I ae’Zn 


with high probability when m > 9 ^ 


is n 


7 


Proof. First, use the Fourier expansion of P to write 


Valx(y)P°'y = 1 E E 1{(T,C)SX} E E P(^)Xcr(^ T c) ^i) 


Ts[n]* cSSj 


aSZj (tSZ^ 


iS[fc] 


]^, c G Zg, and u G 

Note that Xa- only depends on the coordinates in supp((T). We can then write this as 


For each T G [n]^, c G and a G we have a term of the form Xaia + c) Oielfc] 


/ 


kl 




'k-\(T\ 


^ X^(a + c(cj)) JJ y(r(a)i,Q;i) JJ ^ y{T{a)i, 




kl 


2 = 1 


/ 


2 — 1 flGSo 


Using the axioms Ylaew, Vi,a, = 1; fti® second term is equal to 1 and we have 


kl 


^y(i,o) = l> \-k Xaia +c) Y{y{Ti,ai) = ^ Xa{a + c{a))Y{y{T{a)i,ai). 


tlGSg 


iSW 


i6[fc] 


aSZ 


kl 


i=l 


Summing over all T, c, and cr, we obtain the following. 


y{i,a) = 1 

HG Sg 

This is equal to 


kl 


hfc Vali(y)P°^y = ^ Y Y hiT,c)m Y Y Xaia+c{a))Yly{T{a)i, 


Oli 


iS n 


TsM* csaj 
P+ P(a)P^,(a)P°P. 

07^:(TSZg 


aSSl 


i=l 


Since \P{z)\ < 1 and |Xo-(2^)| < k('^)l — We can then apply Lemma B.12 for 

each a to complete the proof. □ 
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SOS refutation of non-t-wise supporting CSPs 

Theorem B.15. Let P be 5-far from being t-wise supporting. Then, given an instance X ~ 

J,,p(n,p) o/CSP(P), 

{y{i,af = y(i,a)}ie[n]U{y(i,a)y(i,6) = 0} U< ^ y{i,a) = 1 > l-max{fc,2t} Valx(?/)P°^y < l-S+^f. 

a&TLq a^be'Zq I aSS, ' 


iS n 


with high probability when m > - - ^ and t > 2. 

To prove this theorem, we need a version of Claim 6.7 for larger alphabets. 

Claim B.16. Let / : > IR such that f{z) > 0 for all z £ 'Zg and let f^{v) = /(“) Oielfc] 

Then 

{v{i, af = v{i, a)} ig[fc] U {vi^aVi,b = 0} i6[fc] hfc f{v) > 0. 

Proof. Since f{z)>0 for all z £ Zg, there exists a function —)• IR such that g‘^{z) = f{z) 

for all z £Zg. We then write g^{v) = ^^v{i,ai). Using v{i,a)‘^ = v{i,a), it follows 

that 

d'ivf = ^ g{oif v{i,ai) + ^ g{a')g{a") r;(z,a')u(i,a") 

CfSSj iG[k] a'^a"SSq iS[fc] 

The first term is equal to f^{u). For the second term, note that each of the products n.e 

must contain factors v{i,a)v{i,b) with a^h since a' 7 ^ a". We have the axiom v{i,a)v{i,b) = 0, 

so the second term is 0. Then = ( 5 ^)^ and the claim follows. □ 

With this claim, the proof of the theorem exactly follows that of Theorem 4.9. 

Proof of Theorem B.15. Claim B.16 implies that 

{v{i,af =v{i,a)}i(,[k]'~->{vii,a)v{i,b) =0} *£[*,] uJ ^u(i,a) = l|' \-k P^{v)-{l-6) < Q^{v) 

a^b&Tlq I 


d^TLn 


i£[k] 


Summing over all constraints, we get that 


A hfc mValj(y)P°'^ — m(l — 5) < E E l{(r,c)ex} Q{a + c)lly{i,ai) 


Te[n]^ cgSj 




i=l 


where T = {y(i,a)2 = y(i,a)}ig[„] U {y(i,a)y(i,6) = 0} ig[„] U | y(i, a) = 1 [ . Using 

a&Zq a^b&TLq ^ 

the Fourier expansion of Q, we see that the right-hand side of the inequality is 

k 

E E l{(r,c)ex} E E T c) y{Ti, Oi) 

Te[n]''ceaJ oes^o-ea^ *=i 

Just as in the proof of Theorem B.14, we can rewrite this in degree-A; SOS as 

kl 

E E l{(T,c)eX} E Q{<^) 'y ^ Xa-(*^ T c((T)) y(r((T)j, CTj). 


Te[n]* cezj 


aew.'' 


i=l 
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We then rearrange to get 


Q{„)5r,y(<rr'^. 

Since E[Q] = 0 and Q > —1, we know that IQI < and therefore |(5(o')| < We can then 

apply Lemma B.12 for each cr to complete the proof. □ 

C Certifying that random hypergraphs have small independence 
number and large chromatic number 

First, we recall some standard definitions. Let H = {V, E) be a hypergraph. We say that S is an 
independent set of H if for all e G E, it holds that e ^ S. The independence number a{H) is then 
the size of the largest independent set of H. A g-coloring of Lf is a function f ■ V ^ [q] such that 
is an independent set for every i G [q]. The chromatic number x{H) is the the smallest 
g G N for which there exists a g-coloring of H. 

We define ^{n^p^k) to be the distribution over n-vertex, /c-uniform (unordered) hypergraphs 
in which each of the (^) possible hyperedges is included independently with probability p. Let m 
be the expected number of hyperedges p(^). 

Coja-Oghlan, Goerdt, and Lanka used CSP refutation techniques to show the following results 
[COGL07]: 

Theorem C.l. (Coja-Oghlan-Goerdt-Lanka [COGL07, Theorem 3]). For H ~ T-L{n,p,3), there 
is a polynomial time algorithm certifying that a{H) < en with high probability for any eonstant 
e > 0 when rn > In® n and fn = o{n^). 

Theorem C.2. (Coja-Oghlan-Goerdt-Lanka [COGL07, implicit in Section 4]). For H n{n,p,4), 
there is a polynomial time algorithm certifying that a{H) < en with high probability for any eonstant 
e > 0 when fn> O . 

Theorem C.3. (Coja-Oghlan-Goerdt-Lanka [COGL07, Theorem 4]). For H ~ 7f(n,p, 4), there 
is a polynomial time algorithm certifying that x{H) > f, with high probability for constant ^ when 
m > 

We generalize these results to A:-uniform hypergraphs: 

Theorem C.4. For H ~ k), there is a polynomial time algorithm eertifying that a{H) < (5 

with high probability when m > Ok ^ ^ ^ ^; assuming that (3 > n^/^logn. 

Theorem C.5. For H ~ T-L{n,p,k), there is a polynomial time algorithm certifying that x{H) > C 
with high probability when m > Ok log^ n), assuming that ^ . 

The proofs are simple extensions of the k = 3 and k = 4 cases from [COGL07]. We will first 
prove Theorem C.4 using Theorem 4.1 and this will almost immediately imply Theorem C.5. 

Proof of Theorem C.4- Recall that Theorem 4.1 deals with /c-tuples, not sets of size k. It is easy 
to express a hypergraph in terms of fe-tuples rather than sets of size k. For a set S and t G Z>o, 
recall the notation (f) = {T C 5 | |T| = t}. For each possible hyperedge e G we associate an 
arbitrary tuple Tg from among the k\ tuples in [n]^ containing the same k elements. To draw from 
'H{n,p, k), we include each Tg independently with probability p and include all other T G [n]^ with 
probability 0. 
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For T G [n]^, we define the random variable w{T) as follows: 


w{T) 


p - l{eeE} if F = Te for some e G 
0 otherwise. 


Let X G {0,1}” be the indicator vector of an independent set I so that = 1 if T C / and = 0 
otherwise. First, observe that 


Te [n] 


p Y. 

Se(W) 


l{eeE}x'' = P 


where the second term is 0 because I is an independent set. The 'w{T)’s satisfy conditions (6), (7), 
and (8) and ||x||oo < 1, so Theorem 4.1 implies we can certify that 


Y w{T)x'^ < n 

re[n]'“ 


with high probability. Simplifying, we see that we can certify 


|/| < Ok 


77,5/4 2J, ^ 


777 2fc 


and plugging in the value of m from the statement of the theorem completes the proof. □ 

Proof of Theorem C.5. For a coloring of a hypergraph H, each color class is an independent set of 
H. If x{H) < then there exists a color class of size at least ^ and therefore a{H) > We can 
then certify that a{H) < j using Theorem C.4. □ 


D Simulating fFp{n,p) with a fixed number of constraints 

The setting of [DLSS14] fixes the number of constraints in a CSP instance, whereas the model 
described in Section 3 includes each possible constraint in the instance with some probability p. 
Here we show that results from our setting easily extend to that of [DLSS14] by giving an algorithm 
that simulates the behavior of our model when the number of constraints is fixed. 

Recall that an instance I ~ J-p{n,p) is generated as follows. For each S G [n]^ and each 
c G {—1,1}^, constraint (c, S) is included with probability p, so the expected number of constraints 
is p ■ (2n)^. 

In the model where the number of constraints is fixed, the instance is guaranteed to have m 
distinct constraints for some value of m. The instance J is chosen uniformly from all subsets of 
{—1,1}^ X [n]^ with size exactly m 

Theorem D.l. Suppose there exists an efficient algorithm R that, on a given CSP instance X ~ 
Fp{n,p), for all p > Pmin, certifies that Opt(X) < r] for some 0 < ry < 1 with high probability. 
Then there exists an efficient algorithm A that certifies that a random instance J o/CSP(P) with 
pL constraints has Opt(/7) < r/ + 2 In/r)^^^ with high probability when p (l — ^ 

(2n)^Pmin- 
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Algorithm 1 

Algorithm A 
!:/>■(— /i(l — d){2n)~^. 

2: draw m ~ Binomial (p, (2n)^) 

3: if m > /i or m < /r(l — 2d) then 
4: return “fail.” 

5: I i — 

6: for i = m + 1... fi do 

7: Remove a random constraint from I chosen uniformly 

8: Run R on I 

9: if R certifies that Opt(X) < r] then 
10: return “Opt(J") < rj + 2d” 

11 : else 

12: return “fail.” 


Proof. On a random instance with /x constraints, we can generate an instance I that simulates 
this behavior by choosing an appropriate value for p, drawing m ~ Binomial (p, (2n)*^) and then 

discarding p — m oi the constraints. For brevity, let d = • Algorithm 1 describes the 

behavior of A. 

The fraction of removed constraints is at most 2d, so even if all of the removed constraints 
would have been satisfied, their contribution to Opt(J") is at most 2d. Consequently, A will never 
incorrectly output “Opt( J") <p + 2d.” 

Furthermore, the probability of failing to refute an instance with value at most 1 — p + 2d due 
to exiting at step 2 is Ofc_t(l). We treat m as a sum of (2n)^ independent Bernoulli variables with 
probability p and denote E[m] by m. Applying a Chernoff bound yields the following. 

Pr[m > p] = Pr[m > m/(l — d)] 

= Pr[m > m(l + y^)] 

< exp ^ M~Vlnxt(l-(xtlnxt)-^/^) ^ 

< exp(-0(ln/x)) = l/poly(/i). 


Similarly, 

Pr[m < p{l — 2d)] = Pr[m < m(l — y^)] 

< exp(-0(lnp)) = l/poly(Ai). 

If p{l — {p~^ Inp) ) > (2n)^pmin, then p > Pmm and R will be able to certify Opt(X) < p with 
high probability. □ 
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